巴西专利BR112013020220B1 METHOD FOR DETERMINING THE PLOIDIA STATUS OF A CHROMOSOME IN A PREGNANT FETUS

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
Method for determining the ploidy state of a chromosome in a gestating fetus The present invention relates to methods for determining the ploidy state of a chromosome in a gestational fetus, from genotypic data measured from a mixed sample of DNA comprising DNA from both the mother of the fetus and the fetus, and optionally from genotypic data of the mother and father. ploidy status is determined using a joint distribution model to create various expected allele distributions for different possible fetal ploidy states, given parental genotypic data, and comparing expected allele distributions with the measured allele distribution pattern in the mixed sample, and choosing the ploidy state whose expected allelic distribution pattern most closely matches the observed allelic distribution pattern. the mixed DNA sample can preferably be enriched in the various polymorphic loci in a way that minimizes the allelic bias, for example, using massively multiplexed directed pcr.
公开号:BR112013020220B1
申请号:R112013020220-3
申请日:2011-11-18
公开日:2020-03-17
发明作者:Matthew Rabinowitz；George Gemelos；Milena Banjevic；Allison Ryan；Zachary Demko；Matthew Hill；Bernhard Zimmermann；Johan Baner
申请人:Natera, Inc.；
IPC主号:

专利说明:

Invention Patent Descriptive Report for "METHOD FOR DETERMINING THE PLOIDIA STATUS OF A CHROMOSOME IN A PREGNANT FETUS".
RELATED REQUESTS
[0001] This order claims the benefit of the US Provisional Order. Serial No. 61 / 462,972, filed February 9, 2011, Provisional Order US. Serial No. 61 / 448,547, filed March 2, 2011, Provisional Order US. Serial No. 61 / 516.996, filed April 12, 2011, US Utility Application. Serial No. 13 / 110,685, filed May 18, 2011, and US Provisional Order. Serial No. 61 / 571.248, filed on June 23, 2011, and all such orders are incorporated herein by reference.
FIELD
[0002] The present invention relates to methods for determining non-invasive prenatal ploidy.
BACKGROUND OF THE INVENTION
[0003] Current methods of prenatal diagnosis can alert doctors and parents to abnormalities in the development of fetuses. Without prenatal diagnosis, one in 50 babies is born with serious physical or mental disabilities, and at most one in 30 will have some form of congenital malformation. Unfortunately, standard methods are either poorly accurate, or involve an invasive procedure that carries a risk of miscarriage. Methods based on hormone levels in maternal blood or ultrasound measurements are not invasive, however, they have low accuracy. Methods such as amniocentesis, chorionic villus biopsy and fetal blood sampling are highly accurate, but are invasive and carry significant risks. Amniocentesis has been performed in approximately 3% of all pregnancies in the United States, although its frequency of use has decreased over the past decade.
[0004] It has recently been discovered that free fetal DNA and intact fetal cells can enter the bloodstream of the mother. Consequently, the analysis of this genetic material may allow the early Non-Invasive Prenatal Genetic Diagnosis (NPD).
[0005] Normal humans have two sets of 23 chromosomes in each healthy diploid cell, with a copy from each parent. Aneuploidy, a condition in a nuclear cell in which the cell contains many and / or very few chromosomes is believed to be responsible for a large percentage of unsuccessful implantations, abortions, and genetic diseases. The detection of chromosomal abnormalities can identify individuals or embryos with conditions such as Down syndrome, Klinefelter syndrome, and Turner syndrome, among others, in addition to increasing the chances of a successful pregnancy. Testing for chromosomal abnormalities is especially important as the mother's age: between 35 and 40 years old, it is estimated that at least 40% of embryos are abnormal, and over 40 years old, more than half of the embryos are abnormal.
Some tests used for prenatal screening [0006] Low levels of plasma protein A associated with pregnancy (PAPP-A), measured in maternal serum during the first trimester, may be associated with fetal chromosomal abnormalities including trisomy 13, 18 and 21. In addition, low levels of PAPP-A in the first trimester may predict an adverse pregnancy, including one for a small baby for gestational age (SGA) or stillborn. Pregnant women often pass the maternal serum screening test in the first trimester, which usually involves testing women for blood levels of the hormones PAPP-A and human chorionic gonadotropin beta (beta-hCG). In some cases, women also undergo ultrasound to look for possible physiological defects. In particular, the measurement of nuchal translucency (NT) may indicate risk of aneuploidy in a fetus. In many areas, the standard of care for prenatal testing includes the first trimester maternal serum screening test combined with an NT test.
[0007] The triple test, also called the triple screening, the Kettering test or the Bart test, is an investigation performed during pregnancy in the second trimester to classify a patient as either at high risk or at low risk of chromosomal abnormalities ( and neural tube defects). The term "multiple marker screening test" is sometimes used instead. The term "triple test" can encompass the terms "double test", "quadruple test", "quad test" and "penta test".
[0008] The triple test measures serum levels of alpha-fetoprotein (AFP), unconjugated estriol (UE3), human chorionic gonadotropin beta (beta-hCG), Invasive Trophoblastic Antigen (ITA) and / or inhibin. A positive test means having a high risk of chromosomal abnormalities (and neural tube defects), and such patients are then referred for more sensitive and specific procedures to receive a definitive diagnosis, especially invasive procedures such as amniocentesis. The triple test can be used to screen for a number of conditions, including trisomy 21 (Down syndrome). In addition to Down's syndrome, triple and quadruple tests screen for trisomy 18 also known as Edward's syndrome, open neural tube defects, and may also detect an increased risk of Turner syndrome, triploid, trisomy mosaicism 16 , fetal death, Smith-Lemli-Opitz syndrome, and steroid sulfatase deficiency.
SUMMARY
[0009] Methods are described here for determining a ploidy state of a chromosome in a gestational fetus. According to aspects illustrated here, in one embodiment, a method for determining a ploidy state of a chromosome in a gestational fetus includes obtaining a first DNA sample comprising maternal DNA from the mother of the fetus and fetal DNA from the fetus; prepare the first sample by isolating the DNA in order to obtain a prepared sample; measure the DNA in the sample prepared at various polymorphic loci on the chromosome; calculate, on a computer, allele counts at the various polymorphic loci from the DNA measurements made on the prepared sample; create, on a computer, several ploidy hypotheses, each belonging to a possible ploidy state different from the chromosome; build, on a computer, a joint distribution model for the expected allele counts at the various polymorphic loci on the chromosome for each ploidy hypothesis; determine, on a computer, a relative probability of each ploidy hypothesis using the joint distribution model and the allele counts measured in the prepared sample; and determine the ploidy state of the fetus by selecting the ploidy state corresponding to the hypothesis with the greatest probability.
[00010] In some embodiments, the DNA in the first sample originates from maternal plasma. In some embodiments, preparing the first sample also involves amplifying the DNA. In some embodiments, preparing the first sample further preferably comprises enriching the DNA in the first sample at various polymorphic loci.
[00011] In some embodiments, preferably enriching the DNA in the first sample at the various polymorphic loci includes obtaining several pre-circularized probes where each probe targets one of the polymorphic loci, and where the 3 'and 5' termination of the probes are designed to hybridize with a region of DNA that is separated from the polymorphic site of the locus by a small number of bases, where the small number is 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 to 25, 26 to 30, 31 to 60, or a combination thereof, hybridize the pre-circularized probes to DNA from the first sample, fill the gap between the terminations hybridized probes using DNA polymerase, circularize the pre-circularized probe, and amplify the circularized probe. [00012] In some embodiments, preferably enriching the DNA in the various polymorphic loci includes obtaining several link-mediated PCR probes where each PCR probe targets one of the polymorphic loci, and where the upstream and downstream PCR waves are designed to hybridize to a region of DNA, on a DNA strand, which is separated from the polymorphic site at the locus by a small number of bases, where the small number is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 to 25, 26 to 30, 31 to 60, or a combination thereof, hybridize the ligation-mediated PCR probes to the DNA of the first sample , fill the gap between binding-mediated PCR probe terminations using DNA polymerase, ligating the ligation-mediated PCR probes, and amplifying the ligation-mediated PCR probes.
[00013] In some embodiments, preferably enriching the DNA in the various polymorphic loci includes obtaining several hybrid capture probes that target the polymorphic loci, hybridizing the hybrid capture probes to the DNA in the first sample and physically removing some or all of the unhybridized DNA of the first DNA sample.
[00014] In some modalities, the hybrid capture probes are designed to hybridize with a region that is flanking, but not overlapping the polymorphic site. In some embodiments, hybrid capture probes are designed to hybridize to a region that is flanking, but not overlapping the polymorphic site, and where the length of the flanking capture probe can be selected from the group consisting of less than approximately 120 bases, less than approximately 110 bases, less than approximately 100 bases, less than approximately 90 bases, less than approximately 80 bases, less than approximately 70 bases, less than approximately 60 bases, less than approximately 50 bases, less than approximately 40 bases, less than approximately 30 bases, and less than approximately 25 bases. In some embodiments, hybrid capture probes are designed to hybridize to a region that overlaps the polymorphic site, and where the various hybrid capture probes comprising at least two hybrid capture probes for each polymorphic loci, and where each hybrid capture probe it is designed to complement a different allele at that polymorphic locus.
[00015] In some embodiments, preferably enriching the DNA at various polymorphic loci includes obtaining several internal direct primers where each primer targets one of the polymorphic loci, and where the 3 'end of the internal direct primers is designed to hybridize to a region of DNA to amount of the polymorphic site, and separated from the polymorphic site by a small number of bases, where the small number is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 to 25, 26 to 30, 31 to 60 base pairs, optionally obtaining several internal reverse primers where each primer targets one of the polymorphic loci, and where the 3 'termination of the internal reverse primers is designed to hybridize to a region of DNA upstream of the polymorphic site, and separated from the polymorphic site by a number of bases, where the small number is selected from the group consisting of 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 to 25, 26 to 30, 31 to 60 base pairs, hybridizing the internal primers to the DNA , and amplify the DNA using the polymerase chain reaction to form amplicons.
[00016] In some embodiments, the method also includes obtaining several external direct primers where each primer targets one of the polymorphic loci, and where the external direct primers are designed to hybridize to the DNA region upstream of the internal direct primer, optionally obtaining several external reverse primers where each primer targets one of the polymorphic loci, and where the external reverse primers are designed to hybridize to the DNA region immediately downstream of the internal reverse primer, hybridize the first primers to the DNA, and amplify the DNA using the reaction polymerase chain.
[00017] In some embodiments, the method also includes obtaining several external reverse primers where each primer targets one of the polymorphic loci, and where the external reverse primers are designed to hybridize to the DNA region immediately downstream of the internal reverse primer, optionally obtaining several external direct primers where each primer targets one of the polymorphic loci, and where the external direct primers are designed to hybridize to the DNA region upstream of the internal direct primer, hybridize the first primers to the DNA, and amplify the DNA using the reaction polymerase chain.
[00018] In some embodiments, preparing the first sample still includes attaching universal adapters to the DNA in the first sample and amplifying the DNA in the first sample using the polymerase chain reaction. In some modalities, at least a fraction of the amplicons that are amplified are less than 100 bp, less than 90 bp, less than 80 bp, less than 70 bp, less than 65 bp, less than 60 bp, less than 55 bp, less than 50 bp, or less than 45 bp, and where the fraction is 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or 99%.
[00019] In some embodiments, DNA amplification is done in one or several reaction volumes, and where each individual reaction volume contains more than 100 different forward and reverse primer pairs, more than 200 forward and reverse primer pairs different, more than 500 different forward and reverse primer pairs, more than 1,000 different forward and reverse primer pairs, more than 2,000 different forward and reverse primer pairs, more than 5,000 different forward and reverse primer pairs, more than 10,000 pairs different forward and reverse primers, more than 20,000 different forward and reverse primer pairs, more than 50,000 different forward and reverse primer pairs or more than 100,000 different forward and reverse primer pairs.
[00020] In some embodiments, preparing the first sample further comprises dividing the first sample into several parts, and where the DNA in each part is preferably enriched in a subset of the various polymorphic loci. In some embodiments, internal primers are selected by identifying primer pairs likely to form the unwanted primer duplexes and removing the various primers from at least one of the primer pair identified as being likely to form unwanted primer duplexes. In some embodiments, the internal primers contain a region that is designed to hybridize either upstream or downstream of the targeted polymorphic locus, and optionally obtain a universal primer sequence designed to allow for PCR amplification. In some embodiments, at least some of the primers additionally contain a random region that differs from each individual primer molecule. In some embodiments, at least some of the primers additionally contain a molecule bar code.
[00021] In some embodiments, the method also includes obtaining genotypic data from one or both parents of the fetus. In some embodiments, obtaining the genotypic data from one or both parents of the fetus includes preparing the DNA of the parents where the preparation preferably comprises enriching the DNA at the various polymorphic loci to obtain the prepared parental DNA, optionally amplifying the prepared parental DNA, and measure the parental DNA in the sample prepared at the various polymorphic loci.
[00022] In some modalities, the construction of a joint distribution model for the expected allele counting probabilities of the various polymorphic loci is done using genetic data obtained from one or both parents. In some modalities, the first sample was isolated from maternal plasma and where obtaining the mother's genotypic data is done by estimating the maternal genotypic data from the DNA measurements made on the prepared sample.
[00023] In some modalities, preferential enrichment results in an average degree of allelic bias between the prepared sample and the first sample of a factor selected from the group consisting of no more than a factor of 2, no more than one factor of 1.5, no more than a factor of 1.2, no more than a factor of 1.1, no more than a factor of 1.05, no more than a factor of 1.02, no more than a factor of 1.01, no more than a factor of 1.005, no more than a factor of 1.002, no more than a factor of 1.001, no more than a factor of 1,0001. In some embodiments, the various polymorphic loci are SNPs. In some embodiments, the measurement of DNA in the prepared sample is done by sequencing.
[00024] In some embodiments, a diagnostic box is described to help determine a ploidy state of a chromosome in a gestational fetus where the diagnostic box is able to perform the preparation and measurement steps of the method of claim 1.
[00025] In some modalities, allele counts are probabilistic rather than binary. In some embodiments, DNA measurements in the sample prepared at the various polymorphic loci are also used to determine whether or not the fetus has inherited one or more disease-related haplotypes.
[00026] In some embodiments, the construction of a joint distribution model for the allele counting probabilities is done using data on the probability of the chromosomes crossing different locations on a chromosome to model the dependency between the polymorphic alleles on the chromosome. In some modalities, the construction of a joint distribution model for allele counts and the step of determining the relative probability of each hypothesis are done using a method that does not require the use of a reference chromosome.
[00027] In some modalities, determining the relative probability of each hypothesis makes use of an estimated fraction of fetal DNA in the prepared sample. In some embodiments, DNA measurements from the prepared sample used to calculate allele count probabilities and to determine the relative probability of each hypothesis comprise primary genetic data. In some modalities, the selection of the ploidy state corresponding to the hypothesis with the highest probability is performed using estimates of maximum probability or maximum estimates a posteriori.
[00028] In some embodiments, determining the ploidy status of the fetus also includes combining the relative probabilities of each of the ploidy hypotheses determined using the joint distribution model and the allele counting probabilities with relative probabilities of each of the hypotheses of ploidy that are calculated using statistical techniques obtained from a group consisting of an analysis of red blood cell count, comparing heterozygosity rates, a statistic that is only available when parental genetic information is used, the probability of normalized genotype signals for certain parental contexts, a statistic that is calculated using an estimated fetal fraction of the first sample or the prepared sample, and combinations thereof.
[00029] In some modalities, a confidence estimate is calculated for the determined ploidy state. In some modalities, the method also includes performing a clinical action based on the fetus' determined ploidy state, where the clinical action is selected from one of terminating the pregnancy or maintaining the pregnancy.
[00030] In some modalities, the method can be performed for fetuses between 4 and 5 weeks of gestation, between 5 and 6 weeks of gestation, between 6 and 7 weeks of gestation, between 7 and 8 weeks of gestation, between 8 and 9 weeks of gestation, between 9 and 10 weeks of gestation, between 10 and 12 weeks of gestation, between 12 and 14 weeks of gestation, between 14 and 20 weeks of gestation, between 20 and 40 weeks of gestation, in the first trimester, in the second quarter, third quarter or combinations thereof.
[00031] In some embodiments, a report showing a ploidy state determined from a chromosome in a gestational fetus is generated using the method. In some embodiments, a kit is described to determine a ploidy state of a target chromosome in a gestational fetus designated for use with the method of claim 9, the kit includes several internal forward primers and optionally several internal reverse primers, where each of the primers is designed to hybridize to the DNA region immediately upstream and / or downstream of one of the polymorphic sites on the target chromosome, and optionally additional chromosomes, where the hybridization region is separated from the polymorphic site by a small number of bases, where the small number is selected from the group consisting of 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 , 20, 21 to 25, 26 to 30, 31 to 60, and combinations thereof. [00032] In some embodiments, a method is described to determine the presence or absence of fetal aneuploidy in a maternal tissue sample comprising maternal and fetal genomic DNA, the method includes (a) obtaining a mixture of fetal and maternal genomic DNA from of said maternal tissue sample, (b) massively conducting parallel DNA sequencing of DNA fragments selected at random from the mixture of fetal and maternal genomic DNA from step (a) to determine the sequence of said DNA fragments, ( c) identify the chromosomes to which the sequences obtained in step (b) belong, (d) use the data from step (c) to determine an amount of at least one first chromosome in said mixture of fetal and maternal genomic DNA, where assumes that at least one said first chromosome is euploid in the fetus, (e) use the data from step (c) to determine an amount of a second chromosome in said mixture of fetal and maternal genomic DNA , where it is suspected that said second chromosome is aneuploid in the fetus, (f) calculating the fraction of fetal DNA in the mixture of fetal and maternal genomic DNA, (g) calculating an expected distribution of the quantity of the second target chromosome if the second target chromosome is euploid, using the number in step (d), (h) calculating an expected distribution of the quantity of the second target chromosome if it is aneuploid, using the first number in step (d) and the calculated fraction of fetal DNA in the DNA mixture fetal and maternal genomics in step (f), and (i) use the maximum likelihood or maximum a posteriori approximation to determine whether the quantity of the second chromosome as determined in step (e) is more likely to be part of the distribution calculated in step ( g) or the distribution calculated in step (h); thus indicating the presence or absence of fetal aneuploidy. BRIEF DESCRIPTION OF THE DRAWINGS
[00033] The modalities presently described will be explained in relation to the attached drawings, where similar structures are referred to by similar numbers for all the different views. The drawings shown are not necessarily to scale, with emphasis instead of being generally located by illustrating the principles of the modalities currently described.
[00034] Figure 1 is a graphical representation of the direct multiplexed mini-PCR method.
[00035] Figure 2 is a graphical representation of the semi-nested mini-PCR method.
[00036] Figure 3 is a graphical representation of the completely nested mini-PCR method.
[00037] Figure 4 is a graphical representation of the hemianigned mini-PCR method.
[00038] Figure 5 is a graphical representation of the tri-hemianigned mini-PCR method.
[00039] Figure 6 is a graphical representation of the unilateral nested mini-PCR method.
[00040] Figure 7 is a graphical representation of the unilateral mini-PCR method.
[00041] Figure 8 is a graphical representation of the reverse semi-nested mini-PCR method.
[00042] Figure 9 shows some possible workflows for semi-nested methods.
[00043] Figure 10 is a graphical representation of loop connection adapters.
[00044] Figure 11 is a graphical representation of internally marked initiators.
[00045] Figure 12 is an example of some initiators with internal markers.
[00046] Figure 13 is a graphical representation of a method using primers with a connection adapter connection region.
[00047] Figure 14 shows simulated ploidy determination accuracy for the counting method with two different analysis techniques.
[00048] Figure 15 shows the relationship of two alleles to several SNPs in a cell line in Experiment 4.
[00049] Figure 16 shows a relationship of two alleles to several SNPs in a cell line in Experiment 4 classified by chromosome.
[00050] Figure 17 shows a relationship of two alleles to several SNPs in four plasma samples from pregnant women, classified by chromosome.
[00051] Figure 18 is a fraction of data that can be explained by binomial variance before and after data correction.
[00052] Figure 19 is a graph showing the relative enrichment of fetal DNA in samples following a short library preparation protocol.
[00053] Figure 20 is the depth of the reading graph comparing direct and semi-aligned PCR methods.
[00054] Figure 21 is a comparison of reading depth for direct PCR of three genomic samples.
[00055] Figure 22 is a comparison of reading depth for semi-nested mini-PCR of three samples.
[00056] Figure 23 is a comparison of reading depth for 1,200 plex and 9,600 plex reactions.
[00057] Figure 24 shows the counting ratios for six cells on three chromosomes.
[00058] Figure 25 shows the allele relationships for two reactions of three cells and a third reaction performed on 1 ng of genomic DNA on three chromosomes.
[00059] Figure 26 shows the allele relationships for two single cell reactions on three chromosomes.
[00060] While the drawings identified above have modalities presently described, other modalities are also observed, as noted in the discussion. This description presents illustrative modalities through representation and not limitation. Numerous other modifications and modalities can be formulated by those skilled in the art that are within the scope and spirit of the principles of the modalities presently described.
DETAILED DESCRIPTION
[00061] In one embodiment, the present description provides ex vivo methods for determining the ploidy status of a chromosome in a gestational fetus from genotypic data measured from a mixed DNA sample (i.e., DNA from the mother of the fetus , and DNA of the fetus) and optionally of genotypic data measured from a sample of genetic material from the mother and possibly also from the fetus, where the determination is made using a joint distribution model to create a set of expected allelic distributions for possible states of different ploidy given the parental genotypic data, and compare the expected allelic distributions with the actual allelic distributions measured in the mixed sample, and choose the ploidy state whose expected allelic distribution pattern most closely matches the observed allelic distribution pattern. In one embodiment, the mixed sample is derived from maternal blood, or maternal serum or plasma. In one embodiment, the mixed DNA sample can preferably be enriched at various polymorphic loci. In one embodiment, preferential enrichment is done in a way that minimizes allelic bias. In one embodiment, the present description refers to a DNA composition that has been preferably enriched at various loci such that the allelic bias is low. In one embodiment, the allelic distribution (s) is measured by DNA sequencing from the mixed sample. In one embodiment, the joint distribution model assumes that the alleles will be distributed in a binomial way. In one embodiment, the set of expected joint allele distributions is created for genetically linked loci considering the existing recombination frequencies from various sources, for example, using data from the International HapMap Consortium.
[00062] In one embodiment, the present description provides methods for non-invasive prenatal diagnosis (NPD), specifically, by determining the state of aneuploidy of a fetus by observing measurements of alleles at various polymorphic loci in measured genotypic data in DNA mixtures, where certain measurements of alleles are indicative of an aneuploid fetus, while other measurements of alleles are indicative of an euploid fetus. In one embodiment, genotypic data is measured by sequencing DNA mixtures that have been derived from maternal plasma. In one embodiment, the DNA sample can preferably be enriched in DNA molecules that correspond to the various loci whose allelic distributions are being calculated. In one embodiment, a DNA sample comprising only or almost only the mother's genetic material and possibly also a DNA sample comprising only or almost only the father's genetic material are measured. In one embodiment, the genetic measurements of one or both parents together with the estimated fetal fraction are used to create several expected allelic distributions corresponding to possible different underlying genetic states of the fetus; the expected allelic distributions can be called hypotheses. In one embodiment, maternal genetic data is not determined by measuring genetic material that is exclusively or almost exclusively maternal in nature, preferably estimated from genetic measurements made on maternal plasma that comprises a mixture of maternal and fetal. In some modalities, the hypotheses may include fetal ploidy on one or more chromosomes, whose segments in the fetus were inherited from the parents, and combinations of them. In some modalities, the fetus' ploidy state is determined by comparing the observed allele measurements with the different hypotheses where at least some of the hypotheses correspond to different ploidy states, and selecting the ploidy state that corresponds to the hypothesis that it is more likely to be true, given the observed allele measurements. In one embodiment, this method involves using allele measurement data from some or all of the SNPs measured, regardless of whether the loci are homozygous or heterozygous, and then it does not involve using alleles at the loci that are only heterozygous. This method may not be appropriate for situations where the genetic data belongs to only one polymorphic locus. This method is particularly advantageous when the genetic data comprises data for more than ten polymorphic loci for a target chromosome or more than twenty polymorphic loci. This method is especially advantageous when the genetic data comprises data for more than 50 polymorphic loci for a target chromosome, more than 100 polymorphic loci or more than 200 polymorphic loci for a target chromosome. In some embodiments, genetic data may comprise data for more than 500 polymorphic loci for a target chromosome, more than 1,000 polymorphic loci, more than 2,000 polymorphic loci, or more than 5,000 polymorphic loci for a target chromosome.
[00063] In one embodiment, a method described here uses selective enrichment techniques that preserve the relative allele frequencies that are present in the original DNA sample at each polymorphic locus from a set of polymorphic loci. In some embodiments, the amplification and / or selective enrichment technique may involve PCR such as ligation-mediated PCR, fragment capture by hybridization, Molecular Inversion Probes, or other circularization probes. In some embodiments, methods for selective amplification or enrichment may involve the use of probes where, upon correct hybridization to the target sequence, the 3-line or 5-line termination of a nucleotide probe is separated from the polymorphic site of the allele by a small number of nucleotides. This separation reduces the preferred amplification of an allele, called an allelic bias. This is an improvement over the methods that involve the use of probes where the 3-line or 5-line termination of a properly hybridized probe are directly adjacent or very close to the polymorphic site of an allele. In one embodiment, probes in which the hybridization region may or certainly contains a polymorphic site are excluded. Polymorphic sites at the hybridization site can cause uneven hybridization or inhibit hybridization in some alleles, resulting in preferential amplification of certain alleles. These modalities are improvements over other methods that involve targeted amplification and / or selective enrichment because they better preserve the original allele frequencies of the sample at each polymorphic locus, whether the sample is a pure genomic sample from a single individual or a mixture of individuals .
[00064] In one embodiment, a method described here uses highly efficient highly multiplexed targeted PCR to amplify DNA followed by high throughput sequencing to determine the allele frequencies at each target locus. The ability to multiplex more than approximately 50 or 100 PCR primers in a reaction in a way that most of the resulting sequence readings map to the target loci is new and not obvious. One technique that allows highly multiplexed targeted PCR to perform in a highly effective manner involves designating primers that are unlikely to hybridize to each other. PCR probes, typically called primers, are selected by creating a thermodynamic model of potentially adverse interactions between at least 500, at least 1,000, at least 5,000, at least 10,000, at least 20,000, at least 50,000 or at least 100,000 pairs of potential primers, or unwanted interactions between the primers and the sample DNA, and then using the model to eliminate models that are incompatible with other models in the group. Another technique that allows highly multiplexed targeted PCR to perform in a highly effective manner is using a partial or complete nesting approach for targeted PCR. Using one or a combination of these approaches allows for the multiplexing of at least 300, at least 800, at least 1,200, at least 4,000, or at least 10,000 primers in a single group with the resulting amplified DNA comprising the majority of DNA molecules that, when sequenced, will map the targeted loci. Using one or a combination of these approaches allows the multiplexing of a large number of primers in a single group with the resulting amplified DNA comprising more than 50%, more than 80%, more than 90%, more than 95%, more than 98% , or more than 99% of DNA molecules that map the targeted loci.
[00065] In one embodiment, a method described here produces a quantitative measure of the number of independent observations for each allele at a polymorphic locus. This is different from most methods, such as microarrays or qualitative PCR, which provide information about the relationship of two alleles, but do not quantify the number of independent observations of any allele. With methods that provide quantitative information regarding the number of independent observations, only the relationship is used in ploidy calculations, while quantitative information is not useful. To illustrate the importance of retaining information on the number of independent observations, the sample locus with two alleles, A and B, is considered. In a first experiment, twenty A alleles and twenty B alleles are observed, in a second experiment, 200 alleles A and 200 B alleles are observed. In both experiments, the ratio (A / (A + B)) is equal to 0.5, however, the second experiment carries more information than the first about the certainty of the frequency of the A or B allele. prior art involves averaging or adding allele relationships (channel relationships) (ie xi / yi) from individual alleles and analyzing that relationship, either by comparing it with a reference chromosome or using a rule pertaining to how that relationship is expected to behave in particular situations. No allele weight is involved in such methods known in the art, where it is assumed that the same amount of PCR product can be guaranteed for each allele and that all alleles should behave in the same way. Such a method has a number of disadvantages, and more importantly, it prevents the use of a number of improvements that are described in this description.
[00066] In one embodiment, a method described here explicitly models the expected allele frequency distributions in disomy, as well as the various allele frequency distributions that can be expected in cases of trisomy resulting from non-disjunction during meiosis I, non-disjunction during meiosis II, and / or non-disjunction during early mitosis in fetal development. To illustrate why this is important, imagine a case in which there were no crossings: non-disjunction during meiosis I would result in trisomy in which two different counterparts were inherited from a parent; in contrast, non-disjunction during meiosis II or during early mitosis in fetal development would result in two copies of the same homologue of a parent. Each scenario would result in different allele frequencies expected in each polymorphic locus and also in all loci considered to be joint, due to the genetic link. Intersections, which result in the exchange of genetic material between counterparts, make the pattern of inheritance more complex; in one embodiment, the present method accommodates this using recombination rate information in addition to the physical distance between the loci. In one embodiment, to enable the improved distinction between non-disjunction during meiosis I and non-disjunction during meiosis II or mitotic, the present method incorporates an increasing probability of crossing into the model as the distance from the centromere increases. Non-disjunction during meiosis II and mitotic can be distinguished by the fact that mitotic non-disjunction typically results in identical or almost identical copies of a homolog, while the two homologues present following an event of non-disjunction during meiosis II often differ due one or more crosses during gametogenesis.
[00067] In some modalities, a method described here involves comparing the measurements of observed alleles with theoretical hypotheses corresponding to possible fetal genetic aneuploidy, and does not involve a step of quantifying an allele relationship in a heterozygous locus. When the number of loci is less than approximately 20, the ploidy determination made using a method comprising quantifying an allele relationship in a heterozygous locus and a ploidy determination made using a method comprising comparing the observed allele measurements with distribution hypotheses theoretical allele corresponding to possible fetal genetic states can provide a similar result. However, when the number of loci is above 50, these two methods are likely to provide significantly different results; when the number of loci is above 400, above 1,000 or above 2,000, these methods are very likely to provide results that are increasingly significantly different. These differences are due to the fact that a method that comprises quantifying an allele relationship in a heterozygous locus without independently measuring the magnitude of each allele and aggregating or averaging the relationships prevents the use of techniques including using a joint distribution model, perform a linkage analysis, use a binomial distribution model, and / or other advanced statistical techniques, and the use of a method comprising comparing the measurements of observed alleles with hypotheses of theoretical allelic distribution corresponding to possible fetal genetic states can use these techniques that can substantially increase the accuracy of the determination.
[00068] In one embodiment, a method described here involves determining whether the distribution of observed allele measurements is indicative of a euploid or aneuploid fetus using a joint distribution model. The use of a joint distribution model is different and a significant improvement over the methods that determine heterozygosity rates when dealing with polymorphic loci regardless of the fact that the resulting determinations are of significantly higher accuracy. Without being limited by any particular theory, it is believed that one reason why they are of higher accuracy is that the joint distribution model takes into account the link between SNPs, and the probability of crossbreeding having occurred during meiosis that gives rise to gametes that formed the embryo that became the fetus. The purpose of using the concept of linkage when creating the expected distribution of allele measurements for one or more hypotheses is that it allows the creation of expected allele measurement distributions that correspond to the reality considerably better than when the link is not used. For example, it is thought that there are two SNPs, 1 and 2 located close together, and the mother is A in SNP 1 and A in SNP 2 in homologue one, and B in SNP 1 and B in SNP 2 in homologue two. If the father is A for both SNPs in both homologues, and a B is measured for fetus SNP 1, this indicates that homologue two was inherited by the fetus, and then that there is a much greater probability that a B will be present in the fetus in SNP 2. A model that takes into account the link predicts this, while a model that does not take the link into account does not. Alternatively, if a mother was AB in SNP 1 and AB in nearby SNP 2, then two hypotheses corresponding to maternal trisomy at that location could be used - one involving a coincident copy error (no disjunction during meiosis II or mitosis in early fetal development ), and one involving a mismatched copy error (no disjunction during meiosis II). In the case of a coincident copy error trisomy, if the fetus inherited an AA from the mother in SNP 1, then the fetus is much more likely to inherit either AA or BB from the mother in SNP 2, but not AB. In the case of a mismatched copy error, the fetus would inherit an AB from the mother in both SNPs. The hypotheses of allelic distribution made by a ploidy determination method that takes into account the link would make these predictions, and then correspond to the actual allele measurements to a considerably greater degree than a ploidy determination method that does not take into account the Link. Note that a linkage approach is not possible when using a method that relies on the calculation of allele relationships and aggregation of these allele relationships.
[00069] One reason for ploidy determinations using a method that comprises comparing observed allele measurements with theoretical hypotheses corresponding to possible fetal genetic states is more accurate is that when sequencing is used to measure alleles, that method can harvest more information from allele data when the total number of readings is lower than other methods; for example, a method that has the ability to calculate and aggregate allele ratios would produce disproportionately weighted stochastic noise. For example, imagine a case that involves measuring alleles using sequencing, and where there is a set of loci where only five sequence readings were detected for each locus. In one embodiment, for each of the alleles, the data can be compared with the deduced allelic distribution, and weighted according to the number of sequence readings; then the data from these measurements would be appropriately weighted and incorporated into the overall determination. This is in contrast to a method that involves quantifying an allele relationship in a heterozygous locus, as this method could calculate only 0%, 20%, 40%, 60%, 80%, or 100% ratios as possible allele relationships; none of these can be close to the expected allele relationships. In the latter case, the calculated allele ratios would either have to be discarded due to insufficient readings or would have a disproportionate weight and introduce stochastic noise in the determination, thus decreasing the accuracy of the determination. In one embodiment, measurements of individual alleles can be treated as independent measurements, where the relationship between measurements made on alleles at the same locus is no different from the relationship between measurements made on alleles at different loci.
[00070] In one embodiment, a method described here involves determining whether the distribution of observed allele measurements is indicative of a euploid or aneuploid fetus without comparing any of the metrics with allele measurements observed on a reference chromosome that is expected to be disomic (called the RC method). This is a significant improvement over methods, such as methods using shotgun sequencing ("shotgun") that detects aneuploidy by assessing the proportion of fragments randomly sequenced from a suspect chromosome to one or more chromosomes from presumed disomic references. This RC method produces incorrect results if the presumed disomalous reference chromosome is not really disomalous. This can occur in cases where aneuploidy is more substantial than a single chromosome trisomy or where the fetus is triploid and all autosomes are trisomic. In the case of a female triploid fetus (69, XXX), there are in fact no disomic chromosomes. The method described here does not require a reference chromosome and would be able to correctly identify the trisomal chromosomes in a female triploid fetus. For each chromosome, hypothesis, child fraction and noise level, a joint distribution model can be adjusted, without any of the following: reference chromosome data, an estimate of general child fraction, or a fixed reference hypothesis.
[00071] In one embodiment, a method described here demonstrates how to observe allelic distributions at the polymorphic loci can be used to determine a fetus' ploidy status more accurately than prior art methods. In one embodiment, the method uses targeted sequencing to obtain mixed maternal-fetal genotypes and optionally maternal and / or paternal genotypes in various SNPs to first establish the various allele frequency distributions expected under the different hypotheses, and then observe the allele information quantitative obtained in the maternal-fetal mixture and assess which hypothesis fits the data best, where the genetic state corresponding to the hypothesis with the best fit to the data is called the correct genetic state. In one embodiment, a method described here also uses the degree of adjustment to generate confidence that the determined genetic state is the correct genetic state. In one embodiment, a method described here involves using algorithms that analyze the distribution of alleles found for loci that have different parenting contexts, and comparing the observed allele distributions with the expected allele distributions for different ploidy states for different parenting contexts (different parental genotypic patterns). This is different and an improvement over methods that do not use methods that make it possible to estimate the number of independent occurrences of each allele at each locus in a mixed maternal-fetal sample. In one embodiment, a method described here involves determining whether the distribution of observed allele measurements is indicative of a euploid or aneuploid fetus using observed allele distributions measured at the loci where the mother is heterozygous. This is different and an improvement over the methods that do not use allelic distributions observed in the loci where the mother is heterozygous because, in cases where the DNA is not preferably enriched or is preferably enriched for the loci that are not known to be highly informative for this particular target individual, this allows the use of approximately twice as much genetic measurement data from a sequence data set in ploidy determination, resulting in a more accurate determination.
[00072] In one embodiment, a method described here uses a joint distribution model that assumes that the allele frequencies at each locus are multinomial (and thus binomial when SNPs are biallelic) in nature. In some embodiments, the joint distribution model uses beta-binomial distributions. When using a measurement technique, such as sequencing, it provides a quantitative measurement for each allele present at each locus, the binomial model can be applied to each locus and the degree of allele frequencies underlying it and the confidence that the frequency can be verified. With methods known in the art that generate ploidy determinations from allele relationships, or methods in which quantitative allele information is discarded, the certainty in the observed relationship cannot be verified. The present method is different and an improvement over the methods that calculate the allele relationships and aggregate those relationships to make a ploidy determination, since any method that involves calculating an allele relationship at a particular locus, and then aggregating those relationships necessarily assumes that the measured intensities or counts that are indicative of the amount of DNA from any allele or locus will be distributed in a Gaussian model. The method described here does not involve calculating allele ratios. In some embodiments, a method described here may involve incorporating the number of observations for each allele at various loci in a model. In some modalities, a method described here may involve calculating the expected distributions themselves, allowing the use of a joint binomial distribution model that may be more accurate than any model that assumes a Gaussian distribution of allele measurements. The probability that the binomial distribution model is significantly more accurate than the Gaussian distribution increases as the number of loci increases. For example, when fewer than 20 loci are interrogated, the likelihood that the binomial distribution model is significantly better is low. However, when more than 100, or especially more than 400, or especially more than 1,000, or especially more than 2,000 loci are used, the binomial distribution model will have a very high probability of being significantly more accurate than the Gaussian distribution model. , resulting in a more accurate ploidy determination. The probability that the binomial distribution model is significantly more accurate than the Gaussian distribution also increases as the number of observations at each locus increases. For example, when fewer than 10 distinct sequences are observed at each locus, the probability that the binomial distribution model is significantly better is lower. However, when more than 50 sequence readings, or especially more than 100 sequence readings, or especially more than 200 sequence readings, or especially more than 300 sequence readings are used for each locus, the binomial distribution model will have a probability too high to be significantly more accurate than the Gaussian distribution model, thus resulting in a more accurate ploidy determination.
[00073] In one embodiment, a method described here uses sequencing to measure the number of occurrences of each allele at each locus in a DNA sample. Each sequencing reading can be mapped to a specific locus and treated as a binary sequence reading; alternatively, the probability of the reading identity and / or the mapping can be incorporated as part of the sequence reading, resulting in a probabilistic sequence reading, that is, the probable integer or fractional number of sequence readings that map to a given locus . Using binary counts or the probability of counts it is possible to use a binomial distribution for each set of measurements, allowing a confidence interval to be calculated around the number of counts. This ability to use binomial distribution allows more accurate ploidy estimates and more accurate confidence intervals to be calculated. This is different and an improvement over methods that use intensities to measure the amount of an allele present, for example, methods that use microarrays, or methods that take measurements using fluorescence readers to measure the intensity of fluorescently labeled DNA in bands electrophoretic.
[00074] In one embodiment, a method described here uses aspects of the present data set to determine parameters for the estimated allelic frequency distribution for that data set. This is an improvement over the methods that use the training data set or previous data sets to set the parameters for the present separate allele frequency distributions, or possibly the expected allele relationships. This is because there are already different sets of conditions involved in the collection and measurement of each genetic sample, and thus a method that uses the data from the present data set to determine the parameters for the joint distribution model that is used in the determination of ploidy for this sample it will tend to be more accurate. [00075] In one embodiment, a method described here involves determining whether the distribution of observed allele measurements is indicative of a euploid or aneuploid fetus using a maximum likelihood technique. The use of a maximum likelihood technique is different and a significant improvement over methods using the single hypothesis rejection technique in which determinations will be made with significantly greater precision. One reason is that single hypothesis rejection techniques set cut-off limits based on only one measurement distribution instead of two, meaning that the limits are generally not optimal. Another reason is that the maximum likelihood technique allows the optimization of the cut limit for each individual sample instead of determining a cut limit to be used for all samples regardless of the particular characteristics of each individual sample. Another reason is that the use of the maximum likelihood technique allows the calculation of a confidence for each ploidy determination. The ability to make a confidence calculation for each determination allows a professional to know which determinations are accurate, and which are most likely to be wrong. In some embodiments, a wide variety of methods can be combined with a maximum likelihood estimation technique to improve the accuracy of ploidy determinations. In one embodiment, the maximum likelihood technique can be used in combination with the method described in the US Patent. No. 7,888,017. In one embodiment, the maximum likelihood technique can be used in combination with the method of using targeted PCR amplification to amplify DNA in the mixed sample followed by sequencing and analysis using a reading count method such as that used by TANDEM DIAGNOSTICS, as presented at the International Congress of Human Genetics 2011 in Montreal in October 2011. In one embodiment, a method described here involves estimating the fetal fraction of DNA in the mixed sample and using that estimate to calculate both the ploidy determination and the confidence of the determination ploidy. It is noted that this is different and distinct from the methods that use the estimated fetal fraction as a screening for sufficient fetal fraction, followed by a ploidy determination made using a single hypothesis rejection technique that does not take into account the fetal fraction or produce a confidence calculation for the determination.
[00076] In a determination, a method described here takes into account the tendency for the data to be noisy and contain errors by attaching a probability to each measurement. The use of maximum likelihood techniques to choose the correct hypothesis from the set of hypotheses that were made using the measurement data with the attached probabilistic estimates makes it more likely that the incorrect measurements will be discounted, and the correct measurements will be used in the calculations that lead to the determination of ploidy. To be more precise, this method systematically reduces the influence of data that is incorrectly measured in determining ploidy. This is an improvement over methods where all data is assumed to be equally correct or methods where data beyond acceptable limits are arbitrarily excluded from the calculations leading to a ploidy determination. Existing methods using channel relationship measurements aim to extend the method to multiple SNPs by averaging individual SNP channel relationships. Individual SNPs not weighted by expected measurement variance based on the quality of the SNP and the observed reading depth reduce the accuracy of the resulting statistic, resulting in a reduction in the accuracy of ploidy determination significantly, especially in limit cases.
[00077] In one embodiment, a method described here does not imply knowledge of which SNPs or other polymorphic loci are heterozygous in the fetus. This method allows a ploidy determination to be made in cases where the paternal genotypic information is not available. This is an improvement over methods in which the knowledge of which SNPs are heterozygous needs to be known in advance in order to properly select the loci to target, or to interpret the genetic measurements made on the mixed fetal / maternal DNA sample.
[00078] The methods described here are particularly advantageous when used in samples where a small amount of DNA is available, or where the percentage of fetal DNA is low. This is due to the correspondingly higher exclusion rate of alleles that occurs when only a small amount of DNA is available and / or the correspondingly higher exclusion rate of fetal alleles when the percentage of fetal DNA is low in a mixed DNA sample. fetal and maternal. The high rate of allele exclusion, meaning that a large percentage of the alleles were not measured for the individual target, results in inaccurate fetal fraction calculations, and inaccurate ploidy determinations. As the methods described here can use a joint distribution model that takes into account the link in inheritance patterns between SNPs, significantly more accurate ploidy determinations can be made. The methods described here allow a more accurate ploidy determination to be made when the percentage of DNA molecules that are fetal in the mixture is less than 40%, less than 30%, less than 20%, less than 10% , less than 8%, and even less than 6%.
[00079] In one embodiment, it is possible to determine an individual's ploidy status based on measurements when that individual's DNA is mixed with a related individual's DNA. In one embodiment, the DNA mixture is the free DNA found in maternal plasma, which may include the mother's DNA, with a known karyotype and known genotype, and which can be mixed with fetal DNA, with an unknown karyotype and unknown genotype. It is possible to use the genotypic information known from one or both parents to predict several potential genetic states of DNA in the mixed sample for different ploidy states, different chromosome contributions from each parent to the fetus, and optionally, different fetal DNA fractions in the mix. Each potential composition can be called as a hypothesis. The ploidy state of the fetus can then be determined by looking at the actual measurements, and determining which potential compositions are most likely given the observed data.
[00080] In some embodiments, a method described here could be used in situations where there is a very small amount of DNA present, such as in vitro fertilization, or in forensic situations, where one or a few cells are available (typically less than ten cells, less than twenty cells or less than 40 cells). In these modalities, a method described here serves to make ploidy determinations from a small amount of DNA that is not contaminated by other DNA, but where ploidy determination is very difficult from the small amount of DNA. In some embodiments, a method described here could be used in situations where the target DNA is contaminated with another individual's DNA, for example, in maternal blood in the context of prenatal diagnosis, paternity testing, or conception testing products. Some other situations where these methods would be particularly advantageous would be in the case of a cancer test where only one or a small number of cells are present among a larger number of normal cells. The genetic measurements used as part of these methods could be done on any sample comprising DNA or RNA, for example, but not limited to: blood, plasma, body fluids, urine, hair, tears, saliva, tissue, skin, nails, blastomeres, embryos, amniotic fluid, chorionic villus samples, feces, bile, lymph, cervical mucus, or other cells or materials comprising nucleic acids. In one embodiment, a method described here could be performed with nucleic acid detection methods such as sequencing, microarrays, qPCR, digital PCR, or other methods used to measure nucleic acids. If for some reason it was desirable, the ratios of allele count probabilities in a locus could be calculated, and the allele ratios could be used to determine ploidy status in combination with some of the methods described here, since the methods are compatible. In some embodiments, a method described here involves calculating, on a computer, the allele relationships at the various polymorphic loci from DNA measurements made on processed samples. In some embodiments, a method described here involves calculating, on a computer, allele ratios at the various polymorphic loci from DNA measurements made on samples processed along with any combination of other enhancements described in this description.
[00081] The further discussion of the points above can be found elsewhere in this document.
Noninvasive prenatal diagnosis (NPD) [00082] The process of noninvasive prenatal diagnosis involves a number of steps. Some of the steps may include: (1) obtaining the genetic material from the fetus; (2) enrich the genetic material of the fetus that can be in a mixed sample, ex vivo; (3) amplify the genetic material, ex vivo; (4) preferably enrich the specific loci in the genetic material, ex vivo; (5) measure the genetic material, ex vivo; and (6) analyze the genotypic data, on a computer, and ex vivo. Methods for reducing the practice of these six and other relevant steps are described here. At least some of the method steps are not directly applied to the body. In one embodiment, the present description refers to methods of treatment and diagnosis applied to tissue and other biological materials isolated and separated from the body. At least some of the method steps are performed on a computer.
[00083] Some modalities of the present description allow a method to determine the genetic status of a fetus that is pregnant in a mother in a non-invasive manner such that the baby's health is not put at risk by collecting the genetic material from the fetus , and that the mother is not required to undergo an invasive procedure. Furthermore, in certain aspects, the present description allows the fetal genetic status to be determined with high precision, significantly higher precision than, for example, non-invasive screening based on maternal serum analyte, such as the triple test, which are in wide use in prenatal care.
[00084] The high precision of the methods described here is a result of a computerized approach to the analysis of genotype data, as described here. Modern technological advances have resulted in the ability to measure large amounts of genetic information from a genetic sample using such methods as high-throughput sequencing and genotyping matrices. The methods described here allow a physician to take greater advantage of the large amounts of data available, and make a more accurate diagnosis of the fetal genetic status. Details of a number of modalities are given below. Different modalities may involve different combinations of the steps mentioned above. Various combinations of the different modalities of the different stages can be used interchangeably.
[00085] In one embodiment, a blood sample is obtained from a pregnant mother, and free DNA in the mother's blood plasma, which contains a mixture of both maternal and fetal DNA, is isolated and used to determine the fetal ploidy status. In one embodiment, a method described here involves preferential enrichment of DNA sequences in a mixture of DNA that correspond to polymorphic alleles in such a way that the allele relationships and / or allele distributions remain mainly consistent through enrichment. In one embodiment, a method described here involves highly effective targeted PCR-based amplification such that a very high percentage of the resulting molecules correspond to the targeted loci. In one embodiment, a method described here involves sequencing a mixture of DNA that contains both maternal and fetal DNA. In one embodiment, a method described here involves the use of allele distributions measured to determine the ploidy state of a fetus that is pregnant in a mother. In one embodiment, a method described here involves reporting the determined ploidy status to a doctor. In one embodiment, a method described here involves taking a clinical action, for example, performing invasive follow-up testing such as sampling of chorionic villus or amniocentesis, preparing for the birth of a trisomic individual or an elective termination of a trisomic fetus.
[00086] This order refers to the US Utility Order. No. 11 / 603,406, filed November 28, 2006 (US Publication No. 2007/0184467), US Utility Application. No. 12 / 076,348, filed March 17, 2008 (US Publication No. 2008/0243398); PCT utility application No. PCT / US09 / 52730, filed on August 4, 2009 (PCT Publication No. WO / 2010/017214); PCT Utility Application No. PCT / US10 / 050824, filed September 30, 2010 (PCT Publication No. WO / 2011/041485), and US Utility Application. No. 13 / 110,685 deposited on May 18, 2011. Some of the vocabularies used in this deposit may have their background in these references. Some of the concepts described here can be better understood compared to the concepts found in these references. Screening for maternal blood comprising free fetal DNA [00087] The methods described here can be used to help determine the genotype of a child, fetus, or other target individual where the target's genetic material is found in the presence of a quantity of other material genetic. In some modalities, the genotype can refer to the ploidy state of one or more chromosomes, it can refer to one or more alleles linked to diseases, or some combination of them. In this description, the discussion focuses on determining the genetic status of a fetus where fetal DNA is found in maternal blood, but this example is not intended to be limited to possible contexts to which this method can be applied. In addition, the method can be applicable in cases where the amount of target DNA is in any proportion with the non-target DNA; for example, the target DNA could make up between 0.000001 and 99.999999% of the DNA present. In addition, non-target DNA does not necessarily have to be from an individual, or even a related individual, as long as the genetic data of some or all of the relevant non-target individual (s) is known. In one embodiment, a method described here can be used to determine genotypic data for a fetus from maternal blood that contains fetal DNA. It can also be used in a case where there are multiple fetuses in the womb of a pregnant woman, or where contaminating DNA may be present in the sample, for example, from other children (siblings) already born.
[00088] This technique can make use of the phenomenon of fetal blood cells gaining access to maternal circulation through placental villi. Usually, only a very small number of fetal cells enter the maternal circulation in this model (not enough to produce a positive Kleihauer-Betke test for fetal-maternal hemorrhage). Fetal cells can be classified and analyzed by a variety of techniques to search for particular DNA sequences, but without the risks that invasive procedures inherently have. This technique can also make use of the phenomenon of free fetal DNA, gaining access to maternal circulation by releasing DNA after apoptosis of placental tissue where the placental tissue in question contains DNA from the same genotype as the fetus. Free DNA in maternal plasma has been found to contain fetal DNA in proportions that reach 30 to 40% of fetal DNA.
[00089] In one embodiment, blood can be drawn from a pregnant woman. Research has shown that maternal blood may contain a small amount of free DNA from the fetus, in addition to free DNA of maternal origin. In addition, there may also be enucleated fetal blood cells comprising DNA of fetal origin, in addition to many blood cells of maternal origin, which typically do not contain nuclear DNA. There are many methods known in the art to isolate fetal DNA, or to create fractions in fetal DNA. For example, chromatography has been shown to create fractions that are enriched in fetal DNA. [00090] Since the sample of maternal blood, plasma, or other fluid, taken in a relatively non-invasive manner, and which contains an amount of fetal, or cellular, or free DNA, or enriched in proportion to the maternal DNA, or in its original relationship, it is in the hands, it is possible to genotype the DNA found in the said sample. In some embodiments, blood can be drawn using a needle to draw blood from a vein, for example, the basilic vein. The method described here can be used to determine the fetal genotypic data. For example, it can be used to determine the state of ploidy on one or more chromosomes, it can be used to determine the identity of one or a set of SNPs, including insertions, deletions, and translocations. It can be used to determine one or more haplotypes, including the parent of origin of one or more genotypic characteristics.
[00091] Note that this method will work with any of the nucleic acids that can be used for any of the genotyping and / or sequencing methods, such as ILLUMINA INFINIUM ARRAY, AFFYMETRIX GENECHIP, ILLUMINA GENOME ANALYZER, or LIFE TECHNOLGIES ' SOLID SYSTEM. This includes free DNA extracted from plasma or amplifications (for example, integral genome amplification, PCR) of the same; Genomic DNA from other types of cells (for example, human whole blood lymphocytes) or amplifications thereof. For DNA preparation, any extraction or purification method that generates genomic DNA suitable for one of these platforms will work as well. This method could work equally well with RNA samples. In one embodiment, sample storage can be done in a way that will minimize degradation (for example, below freezing, at approximately -20 ° C, or at a lower temperature).
Parental Support [00092] Some modalities can be used in combination with the PARENTAL SUPPORT® (PS) method, the modalities of which are described in the US Order. No. 11 / 603,406 (US Publication No. 2007/0184467), US Order. No. 12 / 076,348 (US Publication No. 2008/0243398), US Order. 13 / 110,685, PCT Application PCT / US09 / 52730 (PCT Publication No. WO / 2010/017214), and PCT Application No. PCT / US10 / 050824 (PCT Publication No. WO / 2011/041485) which are incorporated herein by reference . PARENTAL SUPPORT® is a computer-based approach that can be used to analyze genetic data. In some embodiments, the methods described here can be considered as part of the PARENTAL SUPPORT® method. In some embodiments, the PARENTAL SUPPORT® method is a collection of methods that can be used to determine the genetic data of a target individual, with high precision, from one or a small number of cells in that individual, or a mixture of DNA consisting of of target individual DNA and DNA of one or more other individuals, specifically to determine disease-related alleles, other alleles of interest, and / or the ploidy state of one or more chromosomes in the target individual. PARENTAL SUPPORT® can refer to any of these methods. PARENTAL SUPPORT® is an example of a computer-based method. [00093] The PARENTAL SUPPORT® method makes use of known parental genetic data, that is, haplotype and / or diploid genetic data from the mother and / or father, together with knowledge of the mechanism of meiosis and the imperfect measurement of the target DNA, and possibly one or more related individuals, along with population-based crossover frequencies, in order to reconstruct, in silico, the genotype in various alleles, and / or the ploidy state of an embryo or any target cell (s), and the target DNA in locating key loci with a high degree of confidence. The PARENTAL SUPPORT® method can reconstruct not only single nucleotide polymorphisms (SNPs) that have been poorly measured, but also insertions and deletions, and SNPs or integral regions of DNA that have not been measured. In addition, the PARENTAL SUPPORT® method can measure multiple loci linked to diseases as well as screening and aneuploidy, from a single cell. In some embodiments, the PARENTAL SUPPORT® method can be used to characterize one or more cells from embryos undergoing biopsy during an IVF cycle to determine the genetic condition of one or more cells.
[00094] The PARENTAL SUPPORT® method allows the cleaning of genetic data with noise. This can be done by inference of the correct genetic alleles in the target genome (embryo) using the genotype of related individuals (parents) as a reference. The PARENTAL SUPPORT® method can be particularly relevant when only a small amount of genetic material is available (for example, PGD) and when direct measurements of the genotypes are inherently noisy due to the limited amounts of genetic material. The PARENTAL SUPPORT® method can be particularly relevant when only a small fraction of the available genetic material is from the target individual (for example, NPD) and when direct measurements of the genotypes are inherently noisy due to the contaminating DNA signal from another individual. The PARENTAL SUPPORT® method is able to reconstruct highly accurate ordered diploid allele sequences in the embryo, along with the number of copies of chromosome segments, even though conventional non-ordered diploid measurements can be characterized by high rates of exclusion of alleles, inclusions, variable amplification biases and other errors. The method can employ both an underlying genetic model and an underlying measurement error model. The genetic model can determine both allelic probabilities in each SNP and crossover probabilities between SNPs. Allele probabilities can be modeled on each SNP based on data obtained from parents and model crossover probabilities between SNPs based on data obtained from the HapMap database, as developed by the International HapMap Project. Given the appropriate underlying genetic model and the measurement error model, the maximum a posteriori estimate (MAP) can be used, with modifications for computational efficiency, to estimate the correct ordered allele values in each SNP in the embryo.
[00095] The techniques described above, in some cases, are able to determine the genotype of an individual given a very small amount of DNA originating from that individual. This could be the DNA of one or a small number of cells, or it could be the small amount of fetal DNA found in maternal blood.
Definitions [00096] Single nucleotide polymorphism (SNP) refers to a single nucleotide that can differ between the genomes of two members of the same species. The use of the term should not imply any limit on the frequency with which each variant occurs.
[00097] Sequence refers to a DNA sequence or a genetic sequence. It can refer to the primary physical structure of the DNA molecule or DNA strand in an individual. It can refer to the nucleotide sequence found in that DNA molecule, or the complementary strand to the DNA molecule. It can refer to the information contained in the DNA molecule as its in silico representation.
[00098] Locus refers to a particular region of interest in an individual's DNA, which can refer to a SNP, the site of a possible insertion or deletion, or the site of some other relevant genetic variation. Disease-linked SNPs can also refer to disease-linked loci.
[00099] Polymorphic allele, also "Polymorphic locus", refers to an allele or locus where the genotype varies between individuals within a given species. Some examples of polymorphic alleles include single nucleotide polymorphisms, short tandem repeated sequences, deletions, duplications, and inversions.
[000100] Polymorphic site refers to specific nucleotides found in a polymorphic region that varies between individuals. [000101] Allele refers to the genes that occupy a particular locus.
[000102] Genetic data also "Genotypic data" refers to data that describe aspects of the genome of one or more individuals. It can refer to one or a set of loci, partial or whole sequences, partial or whole chromosomes, or the entire genome. It can refer to the identity of one or a plurality of nucleotides; it can refer to a set of sequential nucleotides, or nucleotides from different locations in the genome, or a combination of them. Genotypic data is typically in silico, however, it is also possible to consider physical nucleotides in a sequence as chemically encoded genetic data. Genotypic data can be said "in", "from", "from" or "about" the individual (s). Genotypic data can refer to outbound measurements from a genotyping platform where these measurements are made on genetic material.
[000103] Genetic material also "genetic sample" refers to the physical matter, such as tissue or blood, of one or more individuals comprising DNA or RNA.
[000104] Noise genetic data refers to genetic data with any of the following: allele exclusions, uncertain base pair measurements, incorrect base pair measurements, missing base pair measurements, uncertain insertion or deletion measurements, uncertain measurements of chromosome segment copy numbers, spurious signals, missing measurements, other errors, or combinations thereof.
[000105] Confidence refers to the statistical probability that the SNP, allele, set of alleles, determination of ploidy, or determined number of copies of chromosome segment correctly represents the actual genetic state of the individual.
[000106] Ploidy determination, also "Chromosome copy number determination", or "copy number determination" (CNC), can refer to the action of determining the quantity and / or chromosomal identity of one or more chromosomes present in a cell.
[000107] Aneuploidy refers to the state where the wrong number of chromosomes is present in a cell. In the case of a somatic human cell, it can refer to the case where a cell does not contain 22 pairs of autosomal chromosomes and one pair of sex chromosomes. In the case of a human gamete, it can refer to the case where a cell does not contain one of each of the 23 chromosomes. In the case of a single type of chromosome, it can refer to the case in which more or less than two copies of homologous but non-identical chromosomes are present, or in which there are two copies of chromosomes present that originate from the same parent. [000108] Ploidy state refers to the amount and / or chromosomal identity of one or more types of chromosomes in a cell. [000109] Chromosome can refer to a single copy of chromosome, meaning a single DNA molecule of which there are 46 in a normal somatic cell; one example is 'maternally derived chromosome 18'. Chromosome can also refer to a type of chromosome, of which there are 23 in a normal human somatic cell; an example is 'chromosome 18'.
[000110] Chromosomal identity can refer to the number of chromosomes, that is, the type of chromosome. Normal humans have 22 types of autosomal numbered chromosomes, and two types of sex chromosomes. It can also refer to the chromosome's parenting origin and to a specific chromosome inherited from the parent. It can also refer to other characteristics of identification of a chromosome.
[000111] The state of genetic material or simply "Genetic state" can refer to the identity of a set of SNPs in DNA, to the phased haplotypes of genetic material, and to the sequence of DNA, including insertions, deletions, repetitions and mutations. It can also refer to the ploidy state of one or more chromosomes, chromosomal segments, or set of chromosomal segments.
[000112] Allele data refer to a set of genotypic data considering a set of one or more alleles. It can also refer to phased haplotype data. It can refer to SNP identities, and it can refer to DNA sequence data, including insertions, deletions, repetitions and mutations. It can include the parental origin of each allele.
[000113] Allelic state refers to the actual state of genes in a set of one or more alleles. It can refer to the actual state of the genes described by the allelic data.
[000114] Allele ratio or allele ratio refers to the relationship between the quantity of each allele in a locus that is present in a sample or in an individual. When the sample was measured by sequencing, the allelic relation can refer to the relation of sequence readings that maps to each allele in the locus. When the sample was measured by an intensity-based measurement method, the allele ratio may refer to the ratio of the quantities of each allele present in that locus as estimated by the measurement method.
[000115] Allele count refers to the number of sequences that map to a particular locus, and if that locus is polymorphic, it refers to the number of sequences that map to each of the alleles. If each allele is counted in a binary model, then the allele count will be the integral number. If alleles are counted in a probabilistic way, then the allele count can be a fractional number.
[000116] Allele count probability refers to the number of sequences that are likely to map to a particular locus or set of alleles at a polymorphic locus, combined with the mapping probability. Note that the allele counts are equivalent to the allele counting probabilities where the probability of mapping for each counted sequence is binary (zero or one). In some embodiments, the allele counting probabilities can be set to be the same as DNA measurements.
[000117] Allele distribution or 'allele count distribution' refers to the relative quantity of each allele that is present for each locus in a set of loci. An allele distribution can refer to an individual, a sample, or a set of measurements made on a sample. In the context of sequencing, the allelic distribution refers to the number or probable number of readings that map to a particular allele for each allele in a set of polymorphic loci. Allele measurements can be treated probabilistically, that is, the probability that a given allele is present for a given sequence reading is a fraction between 0 and 1, or they can be treated in a binary model, that is, any given reading is considered to be exactly zero or a copy of a particular allele.
[000118] Allele distribution pattern refers to a set of different allele distributions for different parental contexts. Certain patterns of allelic distribution may be indicative of certain ploidy states.
[000119] Allelic bias refers to the degree to which the measured ratio of alleles in a heterozygous locus is different from the relationship that was present in the original DNA sample. The degree of allelic bias at a particular locus is equal to the allelic relation observed at that locus, as a measure, divided by the allele relation in the original DNA sample at that locus. The allelic bias can be defined as being greater than one, such that if the calculation of the degree of allelic bias returns a value, x, which is less than 1, then the degree of allelic bias can be predetermined as 1 / x. The allelic bias may be due to amplification bias, purification bias, or some other phenomenon that affects different alleles differently.
[000120] Primer, also "PCR probe", refers to a single DNA molecule (a DNA oligomer) or a collection of DNA molecules (DNA oligomers) where the DNA molecules are identical, or almost identical, and where the primer contains a region that is designed to hybridize to a targeted polymorphic locus, it contains an initiation sequence designed to allow for PCR amplification. A primer can also contain a molecular barcode. A primer can contain a random region that differs for each individual molecule.
[000121] Hybrid capture probe refers to any possibly modified nucleic acid sequence that is generated by various methods such as PCR or direct synthesis and intended to be complementary to a strand of a specific target DNA sequence in a sample . The exogenous hybrid capture probes can be added to a prepared sample and hybridized through a process of denaturation - re-pairing to form duplexes of exogenous fragments - endogenous. These duplexes can then be physically separated from the sample by various means.
[000122] Sequence reading refers to data representing a sequence of nucleotide bases that have been measured using a clonal sequencing method. Clonal sequencing can produce sequence data representing a single original DNA molecule, or clones or clusters of an original DNA molecule. A sequence reading can also have a quality score associated with each base position of the sequence indicating the probability that the nucleotide has been determined correctly.
[000123] Mapping a sequence reading is the process of determining a source location for the sequence reading in the genome sequence of a particular organism. The source location of sequence readings is based on the nucleotide sequence similarity of the reading and the genomic sequence.
[000124] Mismatched copy error, also "Aneuploidy of coincident chromosomes" (MCA), refers to the state of aneuploidy where a cell contains two identical or almost identical chromosomes. This type of aneuploidy can arise during the formation of gametes in meiosis, and can be called an error of meiotic non-disjunction. This type of error can arise in mitosis. Coincident trisomy can refer to the case where three copies of a given chromosome are present in an individual and two of the copies are identical.
[000125] Mismatched copy error, also "Exclusive chromosome aneuploidy" (UCA), refers to a state of aneuploidy in which a cell contains two chromosomes that are from the same parent, and that can be homologous, but not identical . This type of aneuploidy can arise during meiosis, and can be called a meiotic error. Non-coincident trisomy can refer to the case in which three copies of a given chromosome are present in an individual and two of the copies are from the same parent, and are homologous, but are not identical. Note that non-coincident trisomy may refer to the case where two homologous chromosomes of a parent are present, and where some segments of the chromosomes are identical, while other segments are merely homologous.
[000126] Homologous chromosomes refer to copies of chromosomes that contain the same set of genes that normally match during meiosis.
[000127] Identical chromosomes refer to copies of chromosomes that contain the same set of genes, and for each gene, they have the same set of alleles that are identical, or almost identical.
[000128] Allele exclusion (ADO) refers to the situation in which at least one of the base pairs in a set of homologous chromosomes base pairs in a given allele is not detected.
[000129] Locus exclusion (LDO) refers to the situation in which both base pairs in a set of homologous chromosomes base pairs in a given allele are not detected.
[000130] Homozygous refers to having similar alleles as corresponding chromosomal loci.
[000131] Heterozygote refers to having dissimilar alleles as corresponding chromosomal loci.
[000132] Heterozygosity rate refers to the rate of individuals in the population that have heterozygous alleles at a given locus. The rate of heterozygosity can also refer to the expected relationship or measure of alleles, at a given locus in an individual, or a sample of DNA. [000133] Highly informative single nucleotide polymorphism (HISNP) refers to a SNP in which the fetus has an allele that is not present in the mother's genotype.
[000134] Chromosome region refers to a segment of a chromosome, or an integral chromosome.
[000135] Segment of a chromosome refers to a section of a chromosome that can be sized in the range of a base pair to the entire chromosome.
[000136] Chromosome refers to either an integral chromosome, or a segment or section of a chromosome.
[000137] Copies refer to the number of copies of a chromosome segment. It can refer to identical copies, or to non-identical copies, homologous copies of a chromosome segment where the different copies of the chromosome segment contain a substantially similar set of loci, and where one or more of the alleles are different. Note that in some cases of aneuploidy, such as the M2 copy error, it is possible to have some copies of the given chromosome segment that are identical, as well as some copies of the same chromosome segment that are not identical.
[000138] Haplotype refers to a combination of alleles at multiple loci that are typically inherited together on the same chromosome. Haplotype can refer to at least two loci or even an entire chromosome depending on the number of recombination events that have occurred between a given set of loci. Haplotype can also refer to a set of single nucleotide polymorphisms (SNPs) in a single chromatid that are statistically associated.
[000139] Haplotype data, also "phased data" or "ordered genetic data", refers to data from a single chromosome in a diploid or polyploid genome, that is, either the maternal or paternal copy of a segregated chromosome in a genome diploid.
[000140] Phasing refers to the action of determining an individual's haplotype genetic data given the unordered diploid (or polyploid) genetic data. It can refer to the action of determining which of the two genes in an allele, for a set of alleles found on a chromosome, is associated with each of the two homologous chromosomes in an individual.
[000141] Phased data refers to genetic data where one or more haplotypes have been determined.
[000142] Hypothesis refers to a possible ploidy state in a given set of chromosomes, or a set of possible allelic states in a given set of loci. The set of possibilities can comprise one or more elements.
[000143] Copy number hypothesis, also "Ploidy state hypothesis", refers to a hypothesis regarding the number of copies of a chromosome in an individual. It can also refer to a hypothesis regarding the identity of each of the chromosomes, including the parent of origin of each chromosome, and which of the parent's two chromosomes are present in the individual. It can also refer to a hypothesis with respect to which chromosomes, or chromosome segments, if any, of a related individual genetically correspond to a given chromosome of an individual. [000144] Target individual refers to the individual whose genetic status is being determined. In some embodiments, only a limited amount of DNA is available from the target individual. In some embodiments, the target individual is a fetus. In some modalities, there may be more than one target individual. In some modalities, each fetus that originated from a pair of parents can be considered as target individuals. In some embodiments, the genetic data being determined is one or a set of allele determinations. In some embodiments, the genetic data being determined is a ploidy determination.
[000145] Related individual refers to any individual who is genetically related to the target individual, and thus shares blocks of haplotypes with the target individual. In one context, the related individual can be a genetic parent of the target individual, or any genetic material derived from a parent, such as a sperm, a polar body, an embryo, a fetus, or a child. It can also refer to a brother, parent or grandparent.
[000146] Brother refers to any individual whose genetic parents are the same as the individual in question. In some modalities, it can refer to a born child, an embryo, or a fetus, or one or more cells originating from a born child, an embryo, or a fetus. A sibling may also refer to a haploid individual that originates from one of the parents, such as a sperm, a polar body, or any other set of haplotypic genetic material. An individual can be considered a brother of his own.
[000147] Fetal refers to "from a fetus", or "from the region of the placenta that is genetically similar to the fetus". In a pregnant woman, some part of the placenta is genetically similar to the fetus, and the free fetal DNA found in maternal blood may have originated from the part of the placenta with a genotype that matches the fetus. It is noted that the genetic information on half of the chromosomes in a fetus is inherited from the mother of the fetus. In some embodiments, the DNA of these chromosomes inherited from the mother that they see from a fetal cell is considered to be "of fetal origin", not "of maternal origin". [000148] DNA of fetal origin refers to DNA that was originally part of a cell whose genotype was essentially equivalent to that of the fetus.
[000149] DNA of maternal origin refers to DNA that was originally part of a cell whose genotype was essentially equivalent to that of the mother.
[000150] Child can refer to an embryo, a blastomer, or a fetus. It should be noted that in the modalities presently described, the concepts described apply equally well to individuals who are a born child, fetus, embryo, or set of cells from these individuals. The use of the term child may simply mean that the individual referred to as the child is the genetic offspring of the parents.
[000151] Genitor refers to the genetic mother or father of an individual. An individual typically has two parents, a mother and a father, although this may not necessarily be the case as in genetic or chromosomal chimerism. A parent can be considered to be an individual.
[000152] Parental context refers to the genetic status of a given SNP, on each of the two chromosomes relevant to one or both parents of the target.
[000153] Developing as desired, also "developing normally", refers to a viable embryo implanted in a uterus and resulting in a pregnancy, and / or a continued pregnancy resulting in a live birth, and / or a child born free of chromosomal abnormalities, and / or a child born free of other unwanted genetic conditions such as disease-linked genes. The term "develop as desired" is intended to cover anything that may be desired by parents or health care providers. In some cases, "developing as desired" may refer to an unviable or viable embryo that is useful for medical research or other purposes.
[000154] Insertion in a uterus refers to the process of transferring an embryo to the uterine cavity in the context of in vitro fertilization.
[000155] Maternal plasma refers to the plasma part of the blood of a woman who is pregnant.
[000156] Clinical decision refers to any decision to perform or not an action that has a result that affects the health or survival of an individual. In the context of prenatal diagnosis, a clinical decision may refer to a decision to abort or not to abort a fetus. A clinical decision may also refer to a decision to conduct additional testing, to take actions to smooth out an undesirable phenotype, or to take actions to prepare for the birth of a child with abnormalities.
[000157] Diagnostic box refers to one or a combination of machines designed to perform one or more aspects of the methods described here. In one embodiment, the diagnostic box can be placed at a point of care for the patient. In one embodiment, the diagnostic box can perform targeted amplification followed by sequencing. In one embodiment, the diagnostic box can work alone or with the help of a technician. [000158] Computer-based method refers to a method that relies heavily on statistics to decipher a large amount of data. In the context of parental diagnosis, it refers to a method designed to determine the state of ploidy in one or more chromosomes or the allelic state in one or more alleles by statistically inferring the most likely state, instead of physically measuring the state directly , given the large amount of genetic data, for example, from a molecular matrix or sequencing. In one embodiment of the present description, the computer-based technique can be one described in that patent. In one embodiment of the present description, it can be PARENTAL SUPPORT®.
[000159] Primary genetic data refer to signals of analog intensity that are emitted by a genotyping platform. In the context of SNP matrices, primary genetic data refers to signals of intensity before any genotype determination is made. In the context of sequencing, the primary genetic data refer to analog measurements, analogous to the chromatogram, which releases the sequencer before the identity of any of the base pairs has been determined, and before the sequence has been mapped to the genome. [000160] Secondary genetic data refer to the processed genetic data that are issued by a genotyping platform. In the context of an SNP matrix, secondary genetic data refer to allele determinations made by software associated with the SNP matrix reader, where the software made a determination as to whether a given allele is present or not present in the sample. In the context of sequencing, secondary genetic data refers to the base pair identities of the sequences that were determined, and possibly also when the sequences were mapped to the genome.
[000161] Noninvasive prenatal diagnosis (NPD), or also "Noninvasive prenatal screening" (NPS), refers to a method for determining the genetic status of a fetus that is pregnant in a mother using the genetic material found in the mother's blood, where the genetic material is obtained by taking the mother's intravenous blood.
[000162] Preferential enrichment of DNA that corresponds to a locus, or preferential enrichment of DNA in a locus, refers to any method that results in the percentage of DNA molecules in a mixture of post-enrichment DNA that corresponds to the locus being greater than the percentage of DNA molecules in the pre-enrichment DNA mixture that corresponds to the locus. The method may involve the selective amplification of DNA molecules that correspond to the locus. The method may involve removing DNA molecules that do not correspond to the locus. The method may involve a combination of methods. The degree of enrichment is defined as the percentage of DNA molecules in the post-enrichment mixture that correspond to the locus divided by the percentage of DNA molecules in the pre-enrichment mixture that correspond to the locus. Preferential enrichment can be carried out at several loci. In some embodiments of the present description, the degree of enrichment is greater than 200. In some embodiments of the present description, the degree of enrichment is greater than 2,000. When preferential enrichment is performed at several loci, the degree of enrichment can refer to the average degree of enrichment for all loci in the loci set.
[000163] Amplification refers to a method that increases the number of copies of a DNA molecule.
[000164] Selective amplification can refer to a method that increases the number of copies of a particular DNA molecule, or DNA molecules that correspond to a particular region of DNA. It may also refer to a method that increases the number of copies of a particular targeted DNA molecule, or targeted DNA region, more than increases undirected DNA molecules or regions. Selective amplification can be a preferred enrichment method.
[000165] Universal primer sequence refers to a DNA sequence that can be attached to a population of target DNA molecules, for example, by ligation, PCR, or ligation-mediated PCR. Once added to the target molecule population, primers specific to the universal primer sequences can be used to amplify the target population using a single pair of amplification primers. Universal primer sequences are not typically related to the target sequences.
[000166] Universal adapters, or "ligation adapters" or "library markers" are DNA molecules containing a universal primer sequence that can be covalently linked to the 5-line and 3-line termination of a population of double-stranded DNA molecules target. The addition of the adapters provides universal primer sequence to the 5-line and 3-line termination of the target population from which PCR amplification can take place, amplifying all molecules in the target population, using a single pair of amplification primers.
[000167] Targeting refers to a method used to selectively amplify, or otherwise enrich, preferably those DNA molecules that correspond to a set of loci, in a mixture of DNA.
[000168] Joint distribution model refers to a model that defines the probability of defined events in terms of multiple random variables, given a plurality of random variables defined in the same probability space, where the probabilities of the variable are linked. In some modalities, the degenerate case where the probabilities of the variables are not linked can be used. Hypotheses [000169] In the context of this description, a hypothesis refers to a possible genetic state. It may refer to a possible ploidy state. It may refer to a possible allelic state. A set of hypotheses can refer to a set of possible genetic states, a set of possible allelic states, a set of possible ploidy states, or combinations of them. In some modalities, a set of hypotheses can be designed such that a hypothesis in the set will correspond to the actual genetic state of any given individual. In some embodiments, a set of hypotheses can be designed such that each possible genetic state can be described by at least one hypothesis in the set. In some embodiments of the present description, an aspect of a method is to determine which hypothesis corresponds to the actual genetic state of the individual in question.
[000170] In another embodiment of the present description, a step involves creating a hypothesis. In some modalities, it may be a hypothesis of the number of copies. In some modalities, it may involve a hypothesis as to which segments of a chromosome of each of the related individuals correspond genetically to which segments, if any, of the other related individuals. Creating a hypothesis can refer to the action of setting the limits of variables so that the entire set of possible genetic states that are under consideration are covered by those variables.
[000171] A "copy number hypothesis", also called a "ploidy hypothesis", or a "ploidy state hypothesis", can refer to a hypothesis regarding a possible ploidy state for a given copy of chromosome, type of chromosome, or section of a chromosome, in the target individual. It can also refer to the state of ploidy in more than one of the types of chromosomes in the individual. A set of copy number hypotheses can refer to a set of hypotheses where each hypothesis corresponds to a possible different ploidy state in an individual. A set of hypotheses can consider a set of possible ploidy states, a set of possible contributions from parental haplotypes, a set of possible percentages of fetal DNA in the mixed sample, or combinations thereof.
[000172] A normal individual contains one of each type of chromosome from each parent. However, due to errors in meiosis and mitosis, it is possible that an individual has 0, 1, 2 or more of a given type of chromosome from each parent. In practice, it is rare to see more than two of a given parent's chromosome. In this description, some modalities only consider the possible hypotheses where 0, 1 or 2 copies of a given chromosome come from a parent; it is a trivial extension to consider more or less possible copies originating from a parent. In some modalities, for a given chromosome, there are nine possible hypotheses: the three possible hypotheses considering 0, 1 or 2 chromosomes of maternal origin, multiplied by the three possible hypotheses considering 0, 1 or 2 chromosomes of paternal origin. We make (m, f) refer to the hypothesis where m is the number of a given chromosome inherited from the mother, and f is the number of a given chromosome inherited from the father. So, the nine hypotheses are (0.0), (0.1), (0.2), (1.0), (1.1), (1.2), (2.0), (2 , 1) and (2,2). These can also be written as H00, H01, H02, H10, H12, H20, H21, H22. The different hypotheses correspond to different ploidy states. For example, (1,1) refers to a normal disomalous chromosome; (2.1) refers to maternal trisomy, and (0.1) refers to paternal monosomy. In some embodiments, the case where two chromosomes are inherited from one parent and one chromosome is inherited from the other parent can be further differentiated in two cases: one in which the two chromosomes are identical (mismatch), and one in which the two chromosomes are homologous, but not identical (mismatched copy error). In these modalities, there are sixteen possible hypotheses. It should be understood that it is possible to use other sets of hypotheses, and a different number of hypotheses.
[000173] In some embodiments of the present description, the ploidy hypothesis refers to a hypothesis considering which chromosome of other related individuals corresponds to a chromosome found in the genome of the target individual. In some modalities, a key to the method is the fact that related individuals can be expected to share blocks of haplotypes, and using measured genetic data from related individuals, along with knowledge of which blocks of haplotypes coincide between the target individual and the related individual, it is possible to infer the correct genetic data for a target individual with greater confidence than using only the genetic measurements of the target individual. As such, in some modalities, the ploidy hypothesis may consider not only the number of chromosomes, but also which chromosomes in related individuals are identical, or almost identical, with one or more chromosomes in the target individual.
[000174] Once the hypothesis set has been defined, when the algorithms operate on the input genetic data, they can issue a determined statistical probability for each of the hypotheses under consideration. The probabilities of the various hypotheses can be determined by mathematically calculating, for each of the various hypotheses, the value that the probability matches, as determined by one or more of the specialist techniques, algorithms, and / or methods described in this description, using the relevant genetic data as input.
[000175] Once the probabilities of the different hypotheses are estimated, as determined by a plurality of techniques, they can be combined. This can give, for each hypothesis, multiply the probabilities as determined by each technique. The product of the probabilities of the hypotheses can be normalized. Note that a ploidy hypothesis refers to a possible ploidy state for a chromosome.
[000176] The process of "combining probabilities", also called "combining hypotheses", or combining the results of expert techniques, is a concept that should be familiar to those versed in the technique of linear algebra. A possible way of combining probabilities is as follows: When an expert technique is used to assess a set of hypotheses given a set of genetic data, the result of the method is a set of probabilities that are associated, in a one-to-one model, with each hypothesis in the hypothesis set. When a set of probabilities that were determined by a first expert technique, each of which is associated with one of the hypotheses in the set, is combined with a set of probabilities that were determined by a second expert technique, each of which is associated with the same set of hypotheses, then the two sets of probabilities are multiplied. This means that, for each hypothesis in the set, the two probabilities that are associated with that hypothesis, as determined by the two expert methods, are multiplied, and the corresponding product is the resulting probability. This process can be expanded to any number of technical experts. If only one expert technique is used, then the resulting probabilities are the same as the entry probabilities. If more than two expert techniques are used, then the relevant probabilities can be multiplied at the same time. The products can be normalized so that the probabilities of the hypotheses in the hypothesis set add up to 100%.
[000177] In some modalities, if the combined probabilities for a given hypothesis are greater than the combined probabilities for any of the other hypotheses, then the hypothesis can be considered to be determined to be more likely. In some embodiments, a hypothesis can be determined to be more likely, and the ploidy state, or another genetic state, can be determined if the normalized probability is greater than a threshold. In one embodiment, this may mean that the number and identity of the chromosomes that are associated with this hypothesis can be called the ploidy state. In one embodiment, this may mean that the identity of the alleles that are associated with this hypothesis can be called the allelic state. In some modalities, the limit can be between approximately 50% and approximately 80%. In some modalities, the limit can be between approximately 80% and approximately 90%. In some embodiments, the limit can be between approximately 90% and approximately 95%. In some embodiments, the limit can be between approximately 95% and approximately 99%. In some modalities, the limit can be between approximately 99% and approximately 99.9%. In some modalities, the limit may be above 99.9%.
Parental Contexts [000178] The parental context refers to the genetic status of a given allele, on each of the two chromosomes relevant to one or both of the target's parents. Note that in one modality, the parental context does not refer to the target's allelic state, preferably it refers to the allelic state of the parents. The parental context for a given SNP can consist of four base pairs, two paternal and two maternal; they can be the same or different from each other. It is typically written as "m1m2 | f1f2", where m1 and m2 are the genetic status of the SNP data on the two maternal chromosomes, and f1 and f2 are the genetic state of the SNP data on the two paternal chromosomes. In some modalities, the parental context can be written as "f1f2 | m1m2".
Note that the subscripts "1" and "2" refer to the genotype, in the given allele, of the first and second chromosomes; it is also noted that the choice of which chromosome is marked "1" and which chromosome is marked "2" is arbitrary.
[000179] Note that in this description, A and B are often used to generically represent the identities of base pairs; A or B could equally well represent C (cytosine), G (guanine), A (adenine) or T (thymine). For example, if, in a given SNP-based allele, the mother's genotype was T in that SNP on one chromosome, and G in that SNP on the homologous chromosome, and the father's genotype in that allele is G in that SNP on both homologous chromosomes, and it can be said that the target individual's allele has the parental context of AB | BB; one could also say that the allele has the parental context of AB | AA. Note that, in theory, any of the four possible nucleotides could occur in a given allele, so it is possible, for example, that the mother has an AT genotype, and the father has a GC genotype in a given allele. However, empirical data indicates that, in most cases, only two of the four possible base pairs are observed in a given allele. It is possible, for example, when using short sequences repeated in tandem, to have more than two parental contexts, more than four and even more than ten parental contexts. In this description, the discussion assumes that only two possible base pairs will be observed in a given allele, although the modalities described here can be modified to take into account cases where this hypothesis does not hold.
[000180] A "parental context" can refer to a set or subset of target SNPs that have the same parental context. For example, if 1,000 alleles were measured on a given chromosome in a target individual, then the AA | BB context could refer to the set of all alleles in the 1,000 allele group, when the target mother's genotype was homozygous, and the the target's father's genotype is homozygous, but when the maternal genotype and the paternal genotype are dissimilar in that locus. If parental data is not phased, and thus AB = BA, then there are nine possible parental contexts: AA | AA, AA | AB, AA | BB, AB | AA, AB | AB, AB | BB, BB | AA, BB | AB and BB | BB. If the parental data is phased, and so AB # = BA, then there are sixteen different possible parental contexts: AA | AA, AA | AB, AA | BA, AA | BB, AB | AA, AB | AB, AB | BA, AB | BB, BA | AA, BA | AB, BA | BA, BA | BB, BB | AA, BB | AB, BB | BA and BB | BB. Each SNP allele on one chromosome, excluding some SNPs on the six chromosomes, has one of these parental contexts. The set of SNPs where the parent context for a parent is heterozygous can be called the heterozygous context.
Use of parental contexts in NPD
[000181] Non-invasive prenatal diagnosis is an important technique that can be used to determine the genetic status of a fetus from the genetic material that is obtained in a non-invasive manner, for example, from drawing blood from the mother pregnant. Blood could be separated and plasma isolated, followed by isolation of plasma DNA. The size selection could be used to isolate the DNA of the appropriate length. The DNA can preferably be enriched in a set of loci. That DNA can then be measured by a number of means, such as hybridization to a genotyping matrix and fluorescence measurement, or by sequencing on a high-throughput sequencer.
[000182] When sequencing is used to determine ploidy of a fetus in the context of non-invasive prenatal diagnosis, there are a number of ways to use the sequence data. The most common way that sequence data could be used is simply to count the number of readings that map to a given chromosome. For example, it is thought that the DNA in the sample is comprised of 10% of fetal DNA, and 90% of maternal DNA. In this case, one could consider the average number of readings on a chromosome that is expected to be disomic, for example, chromosome 3, and this is compared to the number of readings on chromosome 21, where the readings are adjusted for the number of base pairs on that chromosome that are part of a single sequence. If the feat were euploid, one would expect that the amount of DNA per unit of genome would be the same in all locations (subject to stochastic variations). On the other hand, if the fetus were trisomal on chromosome 21, then it would be expected that there would be slightly more DNA per chromosome 21 genetic unit than other locations in the genome. Specifically, one could expect that there would be approximately 5% more DNA from chromosome 21 in the mixture. When sequencing is used to measure DNA, approximately 5% more exclusively mappable readings from chromosome 21 per single segment would be expected than from other chromosomes. Observation of an amount of DNA from a particular chromosome that is higher than a certain limit, when adjusted for the number of sequences that are exclusively mappable to that chromosome, could be used as the basis for a diagnosis of aneuploidy . Another method that can be used to detect aneuploidy is similar to the one above, except that parental contexts could be taken into account.
[000183] When considering which alleles to target, one can consider the probability that some parental contexts are likely to be more informative than others. For example, AA | BB and the symmetrical context BB | AA are the most informative contexts, because it is known that the fetus carries an allele that is different from that of the mother. For reasons of symmetry, both the AA | BB and BB | AA context can be called AA | BB. Another set of informative parental contexts is AA | AB and BB | AB, because in these cases, the fetus has a 50% chance of carrying an allele that the mother does not have. For reasons of symmetry, both the AA | AB and BB | AB context can be called AA | AB. A third set of informative parental contexts is AB | AA and AB | BB, because, in these cases, the fetus is carrying a known paternal allele, and that allele is also present in the maternal genome. For reasons of symmetry, both AB | AA and AB | BB contexts can be called AB | AA. A fourth parental context is AB | AB where the fetus has an unknown allelic state, and whatever the allelic state is, it is one in which the mother has the same alleles. The fifth parental context is AA | AA, where the mother and father are heterozygous. Different implementations of the presently described modalities [000184] Methods for determining the ploidy status of a target individual are described here. The target individual can be a blastomer, an embryo, or a fetus. In some embodiments of the present description, a method for determining the ploidy status of one or more chromosomes in a target individual may include any of the steps described in this document, and combinations thereof: [000185] In some embodiments, the source of the genetic material to be used in determining the genetic status of the fetus may be fetal cells, such as nucleated fetal red blood cells, isolated from maternal blood. The method may involve taking a blood sample from the pregnant mother. The method may involve isolating a red cell from fetal blood using visual techniques, based on the idea that a certain color combination is exclusively associated with the nucleated red blood cell, and a similar color combination is not associated with any other cell present in maternal blood. The combination of colors associated with the nucleated red blood cells can include the red color of the hemoglobin around the nucleus, a color that can be made more distinct by staining, and the color of the nuclear material that can be colored, for example, blue. By isolating cells from maternal blood and spreading them over a slide, and then identifying those points where you see both red (from hemoglobin) and blue (from nuclear material), you may be able to identify the location of blood cells nucleated reds. You can then extract those nucleated red blood cells using a micromanipulator, using genotyping and / or sequencing techniques to measure aspects of the genotype of the genetic material in these cells.
[000186] In one embodiment, one can color the nucleated red blood cell with a dye that only fluoresces in the presence of fetal hemoglobin and not maternal hemoglobin, and thus remove the ambiguity between whether the nucleated red blood cell is derived from mother or fetus. Some embodiments of the present description may involve coloring or otherwise marking nuclear material. Some embodiments of the present invention may specifically involve the labeling of fetal nuclear material using fetal cell specific antibodies.
[000187] There are many ways to isolate fetal cells from maternal blood, or fetal DNA from maternal blood, or enrich samples of fetal genetic material in the presence of maternal genetic material. Some of these methods are listed here, but this is not intended to be a complete list. Some appropriate techniques are listed here for convenience: using fluorescently labeled or otherwise labeled antibodies, size exclusion chromatography, magnetically marked or otherwise labeled affinity markers, epigenetic differences, such as differential methylation between maternal and fetal cells in specific alleles, density gradient centrifugation between maternal and fetal cells in specific alleles, density gradient centrifugation succeeded by CD45 / 14 depletion and dCD71-positive selection of negative CD45 / 14 cells, single or double tube Percoll gradients with different osmolalities, or specific lecithin galactose method.
[000188] In one embodiment of the present description, the target individual is a fetus, and different genotype measurements are made on several DNA samples from the fetus. In some embodiments of the present description, fetal DNA samples are from isolated fetal cells where the fetal cells can be mixed with maternal cells. In some embodiments of the present description, the fetal DNA samples are free fetal DNA, where the fetal DNA can be mixed with free maternal DNA. In some embodiments, fetal DNA samples can be derived from maternal plasma or maternal blood that contains a mixture of maternal DNA and fetal DNA. In some modalities, fetal DNA can be mixed with maternal DNA in maternal: fetal relationships in the range of 99.9: 0.1% to 99: 1%; 99: 1% to 90: 10%; 90: 10% to 80: 20%; 80: 20% to 70: 30%; 70: 30% to 50: 50%; 50: 50% to 10: 90%; or 10: 90% to 1: 99%; 1: 99% to 0.1: 99.9%.
[000189] In some modalities, the genetic sample can be prepared and / or purified. There are a number of standard procedures known in the art to achieve this. In some embodiments, the sample can be centrifuged to separate several layers. In some embodiments, DNA can be isolated using filtration. In some embodiments, DNA preparation may involve amplification, separation, chromatography purification, liquid-liquid separation, isolation, preferential enrichment, preferential amplification, targeted amplification, or any of a number of other techniques or known in the art or described herein.
[000190] In some embodiments, a method of the present description may involve amplifying DNA. DNA amplification, a process that transforms a small amount of genetic material into a larger amount of genetic material that comprises a similar set of genetic data, can be done by a wide variety of methods, including, but not limited to, chain reaction polymerase (PCR). One method of amplifying DNA is integral genome amplification (WGA). There are a number of methods available for WGA: ligation-mediated PCR (LM-PCR), degenerate oligonucleotide primer PCR (DOP-PCR), and multiple displacement amplification (MDA). In LM-PCR, short DNA sequences called adapters are attached to blunt ends of DNA. These adapters contain universal amplifier strings, which are used to amplify DNA by PCR. In DOP-PCR, random primers that also contain universal amplifier sequences are used in a first step of pairing and PCR. Then, a second PCR step is used to amplify the sequences even further with the universal primer sequences. MDA uses phi-29 polymerase, which is a highly specific non-specific enzyme that replicates DNA and was used for single cell analysis. The biggest limitations to the amplification of single cell material are (1) the need to use extremely diluted DNA concentrations or extremely small volume of reaction mixture, and (2) difficulty in reliably dissociating DNA from proteins across the integral genome. Regardless, single cell integral genome amplification has been used successfully for a variety of amplifications for a number of years. There are other methods of amplifying DNA from a DNA sample. DNA amplification transforms the initial DNA sample into a DNA sample that is similar in the set of sequences, but of much greater quantity. In some cases, amplification may not be required.
[000191] In some embodiments, DNA can be amplified using universal amplification, such as WGA or MDA. In some embodiments, DNA can be amplified by targeted amplification, for example, using targeted PCR, or circularization probes. In some embodiments, DNA can preferably be enriched using a targeted amplification method, or a method that results in complete or partial separation of desired DNA from unwanted DNA, such as hybridization capture approaches. In some embodiments, DNA can be amplified using a combination of a universal amplification method and a preferred enrichment method. A more complete description of some of these methods can be found in this document.
[000192] The genetic data of the target individual and / or the related individual can be transformed from a molecular state to an electronic state by measuring the appropriate genetic material using tools and / or techniques obtained from a group that includes, but is not limited to a: high performance genotyping and sequencing micromatrices. Some high-throughput sequencing methods include Sanger DNA sequencing, pyrosequencing, the ILLUMINA SOLEXA platform, ILLUMINA's GENOME ANALYZER, or APPLIED BIOSYSTEM's 454 sequencing platform, HELICOS 'TRUE SINGLE MOLECULE SEQUENCING platform, HALCULAR MOLCYE electron microscope sequencing method or any other method of sequencing. All of these methods physically transform the genetic data stored in a DNA sample into a set of genetic data that is typically stored in a memory device that is about to be processed.
[000193] The genetic data of the relevant individual can be measured by analyzing substances obtained from a group including, but not limited to: the individual's diploid tissue, one or more individual's diploid cells, one or more individual's haploid cells , one or more blastomers of the target individual, extracellular genetic material found in the individual, extracellular genetic material of the individual found in maternal blood, cells of the individual found in maternal blood, one or more embryos created from the gamete (s) of the related individual, one or more blastomeres obtained from such an embryo, extracellular genetic material found in the related individual, genetic material known to have originated from the related individual, and combinations thereof.
[000194] In some embodiments, a set of at least one ploidy state hypothesis can be created for each of the types of chromosomes of interest to the target individual. Each of the ploidy state hypotheses can refer to a possible ploidy state of the target individual's chromosome or chromosome segment. The hypothesis set may include some or all of the possible ploidy states that the target individual's chromosome may have. Some of the possible ploidy states may include nullisomy, monosomy, disomy, uniparental disomy, euploidy, trisomy, coincident trisomy, non-coincident trisomy, maternal trisomy, paternal trisomy, tetrasomy, balanced tetrasomy (2: 2), unbalanced tetrasomy (3: 1), pentassomia, hexassomia, another aneuploidy, and combinations thereof. Any of these states of aneuploidy can be mixed or partial aneuploidy such as unbalanced translocations, balanced translocations, Robertsonian translocations, recombination, deletion, insertion, crossing, and combinations thereof.
[000195] In some modalities, knowledge of the determined ploidy state can be used to make a clinical decision. This knowledge, typically stored as a physical matrix of matter in a memory device, can then be turned into a report. The report can then be influenced. For example, the clinical decision may be to terminate the pregnancy; alternatively, the clinical decision may be to continue the pregnancy. In some modalities, the clinical decision may involve an intervention designed to decrease the severity of the phenotypic presentation of a genetic disorder, or a decision to take relevant steps to prepare a child with special needs.
[000196] In one embodiment of the present description, any of the methods described here can be modified to allow multiple targets to come from the same target individual, for example, multiple blood draws from the same pregnant mother. This can improve the accuracy of the model, as multiple genetic measurements can provide more data with which the target genotype can be determined. In one embodiment, one set of target genetic data served as the primary data that was reported, and the others served as data for double-checking the primary target genetic data. In one embodiment, several sets of genetic data, each measured from genetic material obtained from the target individual, are considered in parallel, and so both sets of target genetic data serve to help determine which sections of parental genetic data are measured. with high precision, they make up the fetal genome.
[000197] In one embodiment, the method can be used for the purpose of paternity testing. For example, given the SNP-based genotypic information of the mother, and of a man who may or may not be the genetic father, and the genotypic information from the mixed sample, it is possible to determine whether the male genotypic information actually represents the genetic father. of the unborn fetus. A simple way to do this is to simply look at the contexts where the mother is AA, and the possible father is AB or BB. In these cases, one can expect to see the father's half contribution (AA | AB) or all (AA | BB), respectively. Taking into account the expected ADO, it is easy to determine whether or not the fetal SNPs that are observed are correlated with those of the possible father.
[000198] One modality of the present description could be as follows: a pregnant woman wants to know if her fetus is afflicted with Down's Syndrome, and / or if he will suffer from Cystic Fibrosis, and she does not want to have a child who is afflicted with any of these diseases. A doctor draws your blood, and colors the hemoglobin with a marker so that it looks clearly red, and colors the nuclear material with another marker so that it looks clearly blue. Knowing that maternal red blood cells are typically anuclear, while a high proportion of fetal cells contain a nucleus, the doctor is able to visually isolate a number of nucleated red blood cells by identifying those cells that show both red and blue color. The doctor removes these cells from the slide with a micromanipulator and sends them to a laboratory that amplifies and genotyps ten individual cells. By using genetic measurements, the PARENTAL SUPPORT® method is able to determine that six of the ten cells are maternal blood cells, and four of the ten cells are fetal cells. If a child has already been born to a pregnant woman, the PARENTAL SUPPORT® method can also be used to determine that fetal cells are distinct from the cells of the born child by making reliable determinations in the fetal cells and showing that they are dissimilar to those of the born child. Note that this method is similar in concept to the paternity test modality of the present description. The genetic data measured from fetal cells can be of very poor quality, comprising many exclusions of alleles, due to the difficulty of genotyping single cells. The physician is able to use the fetal DNA measured along with the parents 'reliable DNA measurements to infer aspects of the fetus' genome with high precision using PARENTAL SUPPORT®, thus transforming the genetic data contained in the genetic material from the fetus into the genetic state fetus, stored on a computer. The doctor is able to determine both the ploidy status of the fetus and the presence or absence of a plurality of genes linked to diseases of interest. The fetus is revealed to be euploid, and not a vehicle for cystic fibrosis, and the mother decides to continue the pregnancy. [000199] In one embodiment of the present description, a pregnant mother would want to determine whether her fetus is afflicted with any of the integral chromosomal abnormalities. She goes to her doctor, and provides a sample of her blood, and her husband provides samples of her own cheek smear DNA. A laboratory researcher genotyes the parental DNA using the MDA protocol to amplify the parental DNA, and ILLUMINA INFINIUM arrays to measure the parents' genetic data on a large number of SNPs. The researcher then centrifuges the blood, obtains the plasma, and isolates a free DNA sample using size exclusion chromatography. Alternatively, the researcher uses one or more fluorescent antibodies, such as one that is specific for fetal hemoglobin to isolate a red cell from nucleated fetal blood. The researcher then obtains the enriched or isolated fetal genetic material and amplifies it using an appropriately designated 70 mer oligonucleotide library such that two ends of each oligonucleotide correspond to the flanking sequences on each side of the target allele. Through the addition of a polymerase, ligase and the appropriate reagents, the oligonucleotides underwent circularization of space filling, capturing the desired allele. An exonuclease was added, inactivated by heat, and the products were used directly as a template for PCR amplification. The PCR products were sequenced in an ILLUMINA GENOME ANALYSER. Sequence readings were used as input to the PARENTAL SUPPORT® method, which then predicts the fetus' ploidy status.
[000200] In another modality, a couple - where the mother, who is pregnant, and of advanced maternal age - wants to know if the unborn child has Down's Syndrome, Turner's Syndrome, Prader Willi Syndrome, or some other abnormality integral chromosome. The obstetrician takes a blood sample from the mother and father. The blood is sent to a laboratory, where a technician centrifuges the maternal sample to isolate the plasma and leukocyte cream. The DNA in the leukocyte cream and the paternal blood sample are transformed through amplification and the genetic data encoded in the amplified genetic material are further transformed from molecularly stored genetic data into electronically stored genetic data by running the genetic material in a high-throughput sequencer to measure parental genotypes. The plasma sample is preferably enriched in a set of loci using a 5,000 plex hemianigned directed PCR method. The mixture of DNA fragments is prepared in a DNA library suitable for sequencing. The DNA is then sequenced using a high-throughput sequencing method, for example, ILLUMINA GAIIx GENOME ANALYSER. Sequencing transforms information that is molecularly encoded in DNA into information that is electronically encoded in computer hardware. A computer-based technique that includes the presently described modalities, such as PARENTAL SUPPORT®, can be used to determine the fetus' ploidy status. This may involve calculating, on a computer, probabilities of counting alleles in the plurality of polymorphic loci from the DNA measurements made in the prepared sample; create, on a computer, several ploidy hypotheses, each belonging to a possible ploidy state different from the chromosome; build, on a computer, a joint distribution model for the expected allele counts at the various polymorphic loci on the chromosome for each ploidy hypothesis; determine, on a computer, a relative probability of each ploidy hypothesis using the joint distribution model and the allele counts measured in the prepared sample; and determine the ploidy state of the fetus by selecting the ploidy state corresponding to the hypothesis with the greatest probability. The fetus is determined to have Down syndrome. A report is printed, or sent electronically to the pregnant woman's doctor, who transmits the diagnosis to the woman. The woman, her husband, and the doctor sit down and discuss their options. The couple decides to terminate the pregnancy based on the knowledge that the fetus is afflicted with a trisomic condition. [000201] In one embodiment, a company may decide to offer a diagnostic technology designed to detect aneuploidy in a gestating fetus from a withdrawal of maternal blood. Your product may involve a mother introducing herself to her obstetrician, who can draw her blood. The obstetrician may also collect a genetic sample from the father of the fetus. A doctor can isolate plasma from maternal blood, and purify DNA from plasma. A doctor can also isolate the buffy coat from the mother's blood, and prepare the DNA from the buffy coat. A clinician can also prepare DNA from the paternal genetic sample. The clinician can use molecular biology techniques described in this description to attach universal amplification markers to DNA in the DNA derived from the plasma sample. The clinician can amplify universally labeled DNA. The clinician can preferentially enrich the DNA by a number of techniques including hybridization capture and targeted PCR. Targeted PCR can involve nesting, hemi-nesting or semi-nesting, or any other approach to result in effective enrichment of plasma-derived DNA. Targeted PCR can be massively multiplexed, for example, with 10,000 primers in one reaction, where the primers target the SNPs on chromosomes 13, 18, 21, X and those loci that are common to both X and Y, and optionally other chromosomes as well . Selective enrichment and / or amplification may involve labeling each individual molecule with different labels, molecular bar codes, labels for amplification, and / or labels for sequencing. The clinician can then sequence the plasma sample, and possibly also the prepared maternal and / or paternal DNA. The molecular biology steps can be performed either completely or partially by a diagnostic box. Sequence data can be fed to a single computer, or another type of computing platform as can be found 'in the cloud'. The computing platform can calculate the allele counts in the targeted polymorphic loci from the measurements made by the sequencer. The computing platform can create several ploidy hypotheses pertaining to nullisomy, monosomy, disomy, coincident trisomy, and non-coincident trisomy for each of chromosomes 13, 18, 21, X and Y. The computing platform can build a distribution model set for the expected allele counts at the targeted loci on the chromosome for each ploidy hypothesis for each of the five chromosomes being interrogated. The computing platform can determine a probability that each ploidy hypothesis is true using the joint distribution model and the allele counts measured in the preferably enriched DNA derived from the plasma sample. The computing platform can determine the fetus' ploidy state for each of chromosomes 13, 18, 21, X and Y by selecting the ploidy state corresponding to the relevant hypothesis with the greatest probability. A report can be generated comprising the ploidy states determined, and can be sent to the obstetrician electronically, displayed on an output device, or a printed copy of the report can be delivered to the obstetrician. The obstetrician can inform the patient and optionally the father of the fetus, and they can decide which clinical options are open to them, and which is more desirable.
[000202] In one embodiment, a pregnant woman, hereinafter called "the mother" may decide that she wants to know whether or not her fetus (s) is carrying any of the genetic abnormalities or other conditions. She may wish to ensure that there are not any of the gross abnormalities before she is confident to continue the pregnancy. She can go to your obstetrician, who can take a sample of your blood. He can also obtain a genetic sample, such as a mouth smear, from your cheek. He can also obtain a genetic sample from the father of the fetus, such as a mouth smear, a sperm sample, or a blood sample. He can send the samples to a clinician. The clinician can enrich the fraction of free fetal DNA in the maternal blood sample. The clinician can enrich the fraction of fetal blood cells enucleated in the maternal blood sample. The clinician can use various aspects of the methods described here to determine the genetic data of the fetus. Such genetic data may include the fetus' ploidy state, and / or the identity of one or a number of disease-related alleles in the fetus. A report can be generated summarizing the results of the prenatal diagnosis. The report can be transmitted or emailed to the doctor, who can tell the mother the genetic status of the fetus. The mother may decide to discontinue pregnancy based on the fact that the fetus has one or more conditions, chromosomal, or genetic abnormalities, or undesirable conditions. She may also decide to continue the pregnancy based on the fact that the fetus does not have any of the gross chromosomal or genetic abnormalities, or any of the genetic conditions of interest.
[000203] Another example may involve a pregnant woman who has been artificially inseminated by a sperm donor, and is pregnant. She wants to minimize the risk that the fetus she is carrying has a genetic disease. She has her blood drawn from a phlebotomist, and the techniques described in this description are used to isolate three nucleated fetal red blood cells, and a tissue sample is also collected from the genetic mother and father. The genetic material of the fetus and mother and father is amplified as appropriate and genotyped using ILLUMINA INFINIUM BEADARRAY, and the methods described here clean and stage the parental and fetal genotype with high precision, as well as making ploidy determinations for the fetus. The fetus proves to be euploid, and phenotypic susceptibilities are predicted from the reconstructed fetal genotype, and a report is generated and sent to the mother's doctor so that they can decide which clinical decisions may be best.
[000204] In one embodiment, the raw genetic material of the mother and father is transformed by amplification into a quantity of DNA that is similar in sequence, but greater in quantity. Then, using a genotyping method, the genotypic data that is encoded by nucleic acids is transformed into genetic measurements that can be stored physically and / or electronically on a memory device, such as those described above. The relevant algorithms that make up the PARENTAL SUPPORT® algorithm, the relevant parts of which are discussed in detail here, are translated into a computer program, using a programming language.
Then, by executing the computer program on the computer hardware, instead of being physically encoded bits and bytes, arranged in a pattern that represents raw measurement data, they are transformed into a pattern that represents a highly reliable determination of the state ploidy of the fetus. The details of this transformation will rely on the data itself and the computer language and hardware system used to execute the method described here. Then, the data that is physically configured to represent a high quality ploidy determination of the fetus is transformed into a report that can be sent to a healthcare professional. This transformation can be performed using a printer or a computer screen. The report can be a hard copy, paper or other suitable medium, or it can be electronic. In the case of an electronic report, it can be transmitted, it can be physically stored on a memory device in a location on the computer accessible by the health professional; it can be displayed on a screen so that it can be read. In the case of a screen, the data can be transformed into a readable format causing the physical transformation of pixels on the screen device. The transformation can be carried out by means of electrons physically fired on a phosphorescent screen, by changing an electrical charge that physically changes the transparency of a specific set of pixels on a screen that may be in front of a substrate that emits or absorbs photons . This transformation can be carried out by changing the nanoscale orientation of the molecules in a liquid crystal, for example, from nematic to cholesteric or smectic phase, in a specific set of pixels. This transformation can be carried out by means of an electric current causing photons to be emitted from a specific set of pixels made from a plurality of light-emitting diodes arranged in a significant pattern. This transformation can be performed by any other way used to display information, such as a computer screen, or some other output device or means of transmitting information. The healthcare professional can then act on the report, such that the data in the report is transformed into an action. The action may be to continue or discontinue the pregnancy, in which case an unborn fetus with a genetic abnormality is transformed into a non-living fetus. The transformations listed here can be aggregated, such that, for example, one can transform the genetic material of a pregnant mother and father, through a number of steps described in this description, into a medical decision consisting of aborting a fetus with abnormalities genetic disorders, or consisting of continuing pregnancy. Alternatively, a set of genotypic measurements can be turned into a report that helps a doctor treat his pregnant patient.
[000205] In one embodiment of the present description, the method described here can be used to determine the ploidy status of a fetus even when the host mother, that is, the mother who is pregnant, is not the biological mother of the fetus that she it's loading. In one embodiment of the present description, the method described here can be used to determine the ploidy status of a fetus using only the maternal blood sample, and without the need for a paternal genetic sample.
[000206] Some of the mathematics in the modalities presently described makes hypotheses regarding a limited number of states of aneuploidy. In some cases, for example, only zero, one or two chromosomes are expected to originate from each parent. In some modalities of the present description, the mathematical derivations can be expanded to take into account other forms of aneuploidy, such as quadrosomia, where three chromosomes originate from a parent, pentasomy, hexasomy, etc., without changing the fundamental concepts of the present description. . At the same time, it is possible to focus on a smaller number of ploidy states, for example, only trisomy and disomy. Note that ploidy determinations that indicate a non-integral number of chromosomes may indicate mosaicism in a sample of genetic material.
[000207] In some modalities, the genetic abnormality is a type of aneuploidy, such as Down syndrome (or trisomy 21), Edwards syndrome (trisomy 18), Patau syndrome (trisomy 13), Turner syndrome (45X), Klinefelter syndrome (a male with 2 X chromosomes), Prader-Willi syndrome, and DiGeorge syndrome (UPD 15). Congenital disorders, such as those listed in the previous sentence, are generally undesirable, and the knowledge that a fetus is afflicted with one or more phenotypic abnormalities can provide the basis for a decision to terminate the pregnancy, to take necessary precautions to prepare for the birth of a child with special needs, or to take some therapeutic approach to decrease the severity of a chromosomal abnormality.
[000208] In some embodiments, the methods described here can be used at a very early gestational age, for example, as early as four weeks, five weeks, six weeks, seven weeks, eight weeks, nine weeks, ten weeks, eleven weeks and twelve weeks.
[000209] Note that it has been shown that the DNA that originated from cancer that is living in a host can be found in the blood of the host. In the same way that genetic diagnostics can be made from measuring mixed DNA found in maternal blood, genetic diagnostics can also be made from measuring mixed DNA found in the host's blood. Genetic diagnoses can include states of aneuploidy, or gene mutations. Any claim in the present description that aims to determine the status of ploidy or genetic status of a fetus from measurements made in maternal blood can also determine the state of ploidy or genetic status of a cancer from measurements in the blood of the host.
[000210] In some embodiments, a method of the present description allows the ploidy state of a cancer to be determined, the method includes obtaining a mixed sample containing the host's genetic material, and the genetic material of the cancer; measure the DNA in the mixed sample; calculate the fraction of DNA that is of carcinogenic origin in the mixed sample; and determine the ploidy state of the cancer using the measurements in the mixed sample and in the calculated fraction. In some embodiments, the method may also include administering a cancer therapy based on determining the ploidy state of the cancer. In some embodiments, the method may also include administering a cancer therapy based on determining the ploidy state of the cancer, where the cancer therapy is obtained from the group comprising a pharmacist, a biological therapist, and antibody-based therapy and combination of them.
[000211] In some embodiments, a method described here is used in the context of preimplantation genetic diagnosis (PGD) for embryo selection during in vitro fertilization, where the target individual is an embryo, and parental genotypic data can be used to make ploidy determinations on the embryo from sequencing data from a single cell biopsy or two cells from a 3-day embryo or a biopsy of the tropectoderma from a 5 or 6 day embryo. In a PGD, only the child's DNA is measured, and only a small number of cells are tested, usually one to five, but at most ten, twenty or fifty. The total number of starting copies of alleles A and B (in a SNP) is then trivially determined by the child's genotype and the number of cells. In NPD, the number of starting copies is very high and so it is expected that the allele ratio after PCR accurately reflects the starting ratio. However, the small number of starting copies in PGD means that contamination and the effectiveness of imperfect PCR have a non-trivial effect on the allele ratio after PCR. This effect may be more important than the reading depth to predict the variance in the allele ratio measured after sequencing. The distribution of the allele ratio measured given a known child genotype can be created by the Monte Carlo simulation of the PCR process based on the effectiveness of the PCR probe and the likelihood of contamination. Given the distribution of the allele ratio for each possible child genotype, the probabilities of several hypotheses can be calculated as described for NIPD.
[000212] Any of the modalities described here can be implemented in digital electronic circuit, integrated circuit, specially designed ASICs (integrated circuits of specific application), computer hardware, unalterable software, software, or in combination thereof. The apparatus of the modalities presently described can be implemented in a computer program product tangibly incorporated in a machine-readable storage device for execution by a programmable processor; and method steps of the presently described modalities can be performed by a programmable processor executing an instruction program to execute the functions of the presently described modalities through the operation of entering data and generating output. The presently described modalities can be advantageously implemented in one or more computer programs that are executable and / or interpretable in a programmable system including at least one programmable processor, which can be of specific or general purpose, coupled to receive data and instructions, and to transmit data and instructions to a storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedure or object-oriented language or in machine language if desired; and in any case, the language can be a compiled or interpreted language. A computer program can be deployed in any form, including a stand-alone program, or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to run or interpret on one computer or on multiple computers at one site or distributed across multiple sites and interconnected over a communication network.
[000213] Computer-readable storage media, as used here, refer to physical or tangible storage (as opposed to signs) and include, without limitation, volatile and non-volatile, removable and non-removable media implemented in any method or technology for the tangible storage of information such as computer-readable instructions, data structures, program modules or other data. Computer-readable storage media include, but are not limited to RAM, ROM, EPROM, EEPROM, fast memory or other solid-state memory technology, CD-ROM, DVD, or other typical storage, magnetic tapes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other physical medium or material that can be used to tangibly store the desired information or data or instructions and that can be accessed by a computer or processor.
[000214] Any of the methods described here can include outputting data in a physical format, such as on a computer screen, or on a paper printer. In explaining any of the modalities in this document, it should be understood that the methods described can be combinations with the output of actionable data in a format that can be triggered by a doctor. In addition, the methods described can be combined with the actual execution of a clinical decision that results in clinical treatment, or the execution of a clinical decision to take an action. Some of the modalities described in the document to determine the genetic data belonging to a target individual can be combined with the decision to select one or more embryos for transfer in the context of IVF, optionally combined with the process of transferring the embryo to the uterus of the future mother . Some of the modalities described in the document to determine the genetic data pertaining to a target individual can be combined with the notification of a potential chromosomal abnormality, or absence of it, with a medical professional, optionally combined with the decision to abort, or not to abort, a fetus in the context of prenatal diagnosis. Some of the modalities described here can be combined with the output of actionable data, and the execution of a clinical decision that results in a clinical treatment, or the execution of a clinical decision to take an action.
Targeted enrichment and sequencing [000215] The use of a technique to enrich a DNA sample in a set of target loci followed by sequencing as part of a method for determining non-invasive prenatal alleles or ploidy determination can provide a number of unexpected advantages. In some embodiments of the present description, the method involves measuring genetic data for use with a computer-based method, such as PARENTAL SUPPORT® (PS). The end result of some of the modalities is the actionable genetic data of an embryo or fetus. There are many methods that can be used to measure the genetic data of the individual and / or related individuals as part of embedded methods. In one embodiment, a method for enriching the concentration of a set of targeted alleles is described here, the method comprises one or more of the following steps: targeted amplification of genetic material, addition of specific loci oligonucleotide probes, ligation of DNA strands isolation of desired DNA sets, removal of unwanted components from a reaction, detection of certain DNA sequences by hybridization, and detection of the sequence of one or a plurality of DNA strands by DNA sequencing methods. In some cases, the DNA strands can refer to the target genetic material, in some cases, they can refer to primers, in some cases, they can refer to synthesized sequences, or combinations of them. These steps can be performed in a number of different orders. Given the highly variable nature of molecular biology, it is generally not obvious which methods, and which combinations of steps, will be performed poorly, well, or better in various situations.
[000216] For example, a step of universal DNA amplification before targeted amplification can confer several advantages, such as removing the risk of bottleneck formation and reducing allelic bias. The DNA can be mixed with an oligonucleotide probe that can hybridize to two neighboring regions of the target sequence, one on each side. After hybridization, the ends of the probe can be connected by adding a polymerase, a ligation medium, and any of the necessary reagents to allow the probe to circulate. After circularization, an exonuclease can be added to digest non-circularized genetic material, followed by detection of the circularized probe. The DNA can be mixed with PCR primers that can hybridize to two neighboring regions of the target sequence, one on each side. After hybridization, the probe ends can be connected by adding a polymerase, a ligation medium, and any of the reagents necessary to complete PCR amplification. Amplified or non-amplified DNA can be targeted by hybrid capture probes that target a set of loci; after hybridization, the probe can be located and separated from the mixture to provide a mixture of DNA that is enriched in target sequences.
[000217] In some modalities, the detection of the target genetic material can be done in a multiplexed way. The number of genetic target sequences that can be executed in parallel can be in the range of one to ten, ten to one hundred, one hundred to one thousand, one thousand to ten thousand, ten thousand to one hundred thousand, one hundred thousand to one million, or one million to ten million. It is noted that the prior art includes descriptions of successful multiplexed PCR reactions involving groups of up to approximately 50 or 100 primers, and no more. Previous attempts to multiplex more than 100 primers per group have resulted in significant problems with unwanted side reactions such as primer-dimer formation.
[000218] In some embodiments, this method can be used to genotype a single cell, a small number of cells, from two to five cells, from six to ten cells, from ten to twenty cells, from twenty to fifty cells, from fifty to one hundred cells, one hundred to one thousand cells, or a small amount of extracellular DNA, for example, one to ten picograms, ten to one hundred picograms, one hundred picograms to one nanogram, one to ten nanograms, ten to one hundred nanograms, or a hundred nanograms of a microgram.
[000219] The use of a method to target certain loci followed by sequencing as part of a method for determining alleles or determining ploidy can confer a number of unexpected advantages. Some methods by which DNA can be targeted, or preferably enriched, include the use of circularization probes, linked inverted probes (LIPs, MIPs), capture by hybridization methods such as SURESELECT, and targeted PCR or link-mediated PCR amplification strategies .
[000220] In some embodiments, a method of the present description involves measuring genetic data for use with a computer-based method, such as PARENTAL SUPPORT® (OS). The PARENTAL SUPPORT® method is a computer-based approach to manipulate genetic data, the aspects of which are described here. The end result of some of the modalities is the actionable genetic data of an embryo or fetus followed by a clinical decision based on the actionable data. The algorithms behind the PS method obtain the measured genetic data from the target individual, often an embryo or fetus, and the measured genetic data from related individuals, and are able to increase the accuracy with which the genetic status of the target individual is known. In one embodiment, the measured genetic data is used in the context of making ploidy determinations during prenatal genetic diagnosis. In one embodiment, the measured genetic data is used in the context of making ploidy determinations or allele determinations in embryos during in vitro fertilization. There are many methods that can be used to measure the genetic data of the individual and / or related individuals in the contexts mentioned above. The different methods comprise a number of steps, steps that often involve amplifying genetic material, adding oligonucleotide probes, attaching specific DNA strands, isolating desired DNA sets, removing unwanted components from a reaction, the detection of certain DNA sequences by hybridization, the detection of the sequence of one or more strands of DNA by DNA sequencing methods. In some cases, the DNA strands can refer to target genetic material, in some cases, they can refer to primers, in some cases, they can refer to synthesized sequences, or combinations of them. These steps can be performed in a number of different orders. Given the highly variable nature of molecular biology, it is generally not obvious which methods, which combinations of steps, will perform poorly, well, or better in various situations.
[000221] Note that in theory, it is possible to target any number of loci in the genome, anywhere from one locus to more than one million loci. If a DNA sample is subjected to targeting, and then sequenced, the percentage of alleles that are read by the sequencer will be enriched with respect to its natural abundance in the sample. The degree of enrichment can range from one percent (or even less) to ten times, a hundred times, a thousand times, or even many millions of times. In the human genome, there are approximately 3 billion base pairs, and nucleotides, comprising approximately 75 million polymorphic loci. The more loci are targeted, the lower the degree of enrichment possible. The less loci are targeted, the greater the degree of enrichment possible, and the greater depth of reading can be achieved at these loci for a given number of sequence readings.
[000222] In one embodiment of the present description, the preferred targeting or enrichment may be entirely in SNPs. In one embodiment, the preferred targeting or enrichment can focus on any polymorphic site. A number of commercial targeting products are available to enrich exons. Surprisingly, targeting exclusively SNPs, or exclusively polymorphic loci, is particularly advantageous when using a method for NPD that has allele distributions. There are also published methods for NPD using sequencing, for example, US Patent. No. 7,888,017, involving a reading count analysis when the reading count focuses on counting the number of readings that map to a given chromosome, when the analyzed sequence readings do not focus on regions of the genome that are polymorphic. These types of methodology that do not focus on polymorphic alleles do not benefit from preferential targeting or enrichment of a set of alleles.
[000223] In one embodiment of the present description, it is possible to use a targeting method that focuses on SNPs to enrich a genetic sample in polymorphic regions of the genome. In one modality, it is possible to focus on a small number of SNPs, for example, between 1 and 100 SNPs, or a larger number, for example, between 100 and 1,000, between 1,000 and 10,000, between 10,000 and 100,000 or more than 100,000 SNPs . In one modality, it is possible to focus on one or a small number of chromosomes that are correlated with live trisomic births, for example, chromosomes 13, 18, 21, X and Y, or some combination of them. In one modality, it is possible to enrich SNPs directed by a small factor, for example, between 1.01 times and 100 times, or by a larger factor, for example, between 100 times and 1,000,000 times, or even more than 1,000. .000 times. In one embodiment of the present description, it is possible to use a targeting method to create a DNA sample that is preferably enriched in polymorphic regions of the genome. In one embodiment, it is possible to use this method to create a mixture of DNA with any of these characteristics where the mixture of DNA contains maternal DNA and also free fetal DNA. In one embodiment, it is possible to use this method to create a mixture of DNA that has any combination of the same factors. For example, the method described here can be used to produce a mixture of DNA that comprises maternal DNA and fetal DNA, and that is preferably enriched with DNA that corresponds to 200 SNPs, all of which are located either on chromosome 18 or 21, and which are enriched an average of 1000 times. In another example, it is possible to use the method to create a mixture of DNA that is preferably enriched in 10,000 SNPs that are all or mostly located on chromosomes 13, 18, 21, X and Y, and the average enrichment per loci is greater than 500 times. Any of the targeting methods described here can be used to create mixtures of DNA that are preferably enriched at certain loci.
[000224] In some embodiments, a method of the present description further includes measuring DNA in the mixed fraction using a high-throughput DNA sequencer, where the DNA in the mixed fraction contains a disproportionate number of sequences from one or more chromosomes, where one or more chromosomes are obtained from the group consisting of chromosome 13, chromosome 18, chromosome 21, X chromosome, Y chromosome and combinations thereof.
[000225] Three methods are described here: multiplex PCR, hybridized directed capture, and connected inverted probes (LIPs), which can be used to obtain and analyze measurements from a sufficient number of polymorphic loci from a sample of maternal plasma from in order to detect fetal aneuploidy; this is not intended to exclude other selective loci enrichment methods. Other methods can also be used without changing the essence of the method. In each case, the tested polymorphism may include single nucleotide polymorphisms (SNPs), small insertions / deletions, or STRs. A preferred method involves using SNPs. Each approach produces allelic frequency data; the allelic frequency data for each targeted locus and / or the joint allelic frequency distributions from these loci can be analyzed to determine the fetal ploidy. Each approach has its own considerations due to the limited source material and the fact that maternal plasma consists of a mixture of maternal and fetal DNA. This method can be combined with other approaches to provide a more accurate determination. In one embodiment, this method can be combined with a sequence counting approach as described in the US Patent. No. 7,888,017. The described approaches could also be used to detect fetal paternity in a non-invasive way from maternal plasma samples. In addition, each approach can be applied to other mixtures of DNA or samples of pure DNA to detect the presence or absence of aneuploid chromosomes, to genotype a large number of SNP from degraded DNA samples, to detect variations in copy number (CNVs) to detect other genotypic states of interest, or some combination of them.
Accurately measure allele distributions in a sample [000226] Current sequencing approaches can be used to estimate the distribution of alleles in a sample. One such method involves randomly sampling sequences from DNA, called shotgun sequencing ("shotgun"). The proportion of a particular allele in the sequencing data is typically very low and can be determined by simple statistics. The human genome contains approximately 3 billion base pairs. Thus, if the sequencing method used takes 100 bp readings, a parallel allele will be measured approximately once in every 30 million sequence readings.
[000227] In one embodiment, a method of the present description is used to determine the presence or absence of two or more different haplotypes that contain the same set of loci from a DNA sample from the measured allele distributions of that chromosome loci. The different haplotypes could represent two different homologous chromosomes from an individual, three different homologous chromosomes from a trisomic individual, three different homologous haplotypes from a mother and fetus, where one of the haplotypes is shared between the mother and the fetus, three or four haplotypes of the mother and fetus, where one or two of the haplotypes are shared between the mother and the fetus, or other combinations. Alleles that are polymorphic between haplotypes tend to be more informative, however, any of the alleles where the father and mother are not both homozygous for the same allele will produce useful information through allele distributions measured in addition to the information that is available to from the simple analysis of the reading count. [000228] The shotgun sequencing of such a sample, however, is extremely ineffective, as it results in many sequences for regions that are not polymorphic between the different haplotypes in the sample, or are for chromosomes that they are not of interest, and therefore do not reveal any information about the proportion of target haplotypes. Methods that specifically and / or preferentially enrich the DNA segments in the sample that are most likely to be polymorphic in the genome are described herein to increase the information yield obtained by sequencing. Note that in order for the allelic distributions measured in an enriched sample to be truly representative of the actual amounts present in the individual target, it is crucial that there is little or no preferential enrichment of one allele compared to the other allele at a given locus in the target segments . Current methods known in the art for targeting polymorphic alleles have been designed to ensure that at least some of any of the present alleles are detected. However, these methods were not designed with the purpose of measuring the biased allelic distributions of polymorphic alleles present in the original mixture. It is not obvious that any particular target enrichment method would be able to produce an enriched sample in which the measured allelic distributions more accurately represent the allelic distributions present in the original unamplified sample better than any other method. While it is hoped that many enrichment methods, in theory, can achieve this goal, a person skilled in the art is aware that there is a great deal of deterministic and stochastic bias in the current amplification, targeting and other preferred enrichment methods. One embodiment of a method described in this document allows a plurality of alleles found in a mixture of DNA that correspond to a given locus in the genome to be amplified, or preferably enriched in a way that the degree of enrichment for each of the alleles is almost the same. Another way of saying this is that the method allows the relative quantity of alleles present in the mixture as a whole to be increased, while the ratio between the alleles that correspond to each locus remains essentially the same as that of the original DNA mixture. Prior art methods of preferential loci enrichment can result in allelic biases of more than 1%, more than 2%, more than 5% and even more than 10%. This preferential enrichment may be due to the capture bias when using a hybridization approach, or amplification bias that may be small for each cycle, but can become large when composed of more than 20, 30 or 40 cycles. For the purposes of this description, for the ratio to remain essentially the same it means that the ratio of alleles in the original mixture is divided by the ratio of alleles in the resulting mixture is between 0.95 and 1.05, between 0.98 and 1.02, between 0.99 and 1.01, between 0.995 and 1.005, between 0.998 and 1.002, between 0.999 and 1.001, or between 0.9999 and 1,0001. Note that the calculation of the allele relationships presented here cannot be used to determine the ploidy status of the target individual, and can only be used by a metric to measure the allelic bias.
[000229] In one embodiment, once a mixture has preferably been enriched in the target loci set, it can be sequenced using any of the previous, current or next generation sequencing instruments that sequences a clonal sample (a sample generated from from a single molecule; examples include ILLUMINA GAIIx, ILLUMINA HiSeq, LIFE TECHNOLOGIES SOLiD, 5500XL). Relationships can be assessed by sequencing through specific alleles within the target region. These sequencing readings can be analyzed and counted according to the type of allele and the relationships of different alleles determined accordingly. For variations that are one to a few bases in length, the detection of the alleles will be performed by sequencing, and it is essential that the sequencing reading amplifies the allele in question in order to evaluate the allelic composition of this captured molecule. The total number of captured molecules tested for the genotype can be increased by increasing the length of the reading sequence. The complete sequencing of all molecules would guarantee the collection of the maximum amount of data available in the enriched group. However, sequencing is currently expensive, and a method that can measure allelic distributions using fewer sequence readings will be of great value. In addition, there are technical limitations to the maximum possible reading length, as well as precision limitations as the reading lengths increase. The most useful alleles will be one to a few bases in length, but, theoretically, any allele shorter than the length of the sequencing reading can be used. While allele variations come in all types, the examples provided here focus on SNPs or variants contained in just a few neighboring base pairs. Larger variants, such as segment copy number variants, can be detected by aggregating these minor variations in many cases, as the entire collections of SNP internal to the segment are duplicated. Variants larger than some bases, such as STRs, require special attention and some targeting approaches work while others do not.
[000230] There are several targeting approaches that can be used to specifically isolate and enrich one or a plurality of variant positions in the genome. Typically, they take advantage of the invariant sequence by flanking the variant sequence. There is prior art related to targeting in the context of sequencing where the substrate is maternal plasma (see, for example, Liao et al., Clin Chem 2011, 57 (1): Pages 92 to 101). However, prior art approaches use targeting probes that target exons, and do not focus on targeting the polymorphic regions of the genome. In one embodiment, a method of the present description involves the use of targeting probes that focus exclusively or almost exclusively on polymorphic regions. In one embodiment, a method of the present description involves the use of targeting probes that focus exclusively or almost exclusively on SNPs. In some embodiments of the present description, the target polymorphic sites consist of at least 10% SNPs, at least 20% SNPs, at least 30% SNPs, at least 40% SNPs, at least 50% SNPs, at least 60% SNPs, at least 70% SNPs, at least 80% SNPs, at least 90% SNPs, at least 95% SNPs, at least 98% SNPs, at least 99% SNPs, at least 99 , 9% of SNPs, or exclusively SNPs.
[000231] In one embodiment, a method of the present description can be used to determine the genotypes (base DNA composition at specific loci) and the relative proportions of these genotypes from a mixture of DNA molecules, where these DNA molecules can have originated from one or a number of genetically distinct individuals. In one embodiment, a method of the present description can be used to determine the genotypes in a set of polymorphic loci, and the relative proportions of the number of different alleles present in those loci. In one embodiment, the polymorphic loci can consist entirely of SNPs. In one embodiment, the polymorphic loci can comprise SNPs, short repeated tandem sequences, and other polymorphisms. In one embodiment, a method of the present description can be used to determine the relative allelic distributions in a set of polymorphic loci in a DNA mixture, where the DNA mixture comprises DNA that originates from a mother, and DNA that originates of a fetus. In one embodiment, the joint allelic distributions can be determined in a mixture of DNA isolated from the blood of a pregnant woman. In one embodiment, the allelic distributions of a set of loci can be used to determine the ploidy state of one or more chromosomes in a gestating fetus.
[000232] In one embodiment, the mixture of DNA molecules could be derived from DNA extracted from several cells of an individual. In one embodiment, the original collection of cells from which DNA is derived may comprise a mixture of diploid or haploid cells of the same or different genotypes, whether that individual is a mosaic (germline or somatic). In one embodiment, the mixture of DNA molecules could also be derived from DNA extracted from single cells. In one embodiment, the mixture of DNA molecules could also be derived from DNA extracted from a mixture of two or more cells from the same individual, or from different individuals. In one embodiment, the mixture of DNA molecules could be derived from DNA isolated from biological material that has already been released from cells such as blood plasma, which is known to contain free DNA. In one embodiment, this biological material can be a mixture of DNA from one or more individuals, as is the case during pregnancy, where it has been shown that fetal DNA is present in the mixture. In one embodiment, the biological material could be from a mixture of cells that were found in maternal blood, where some of the cells are of fetal origin. In one embodiment, the biological material could be cells from the blood of a pregnant woman that have been enriched in fetal cells.
Circularization Probes [000233] Some modalities of the present description involve the use of "Connected inverted probes" (LIPs), which were previously described in the literature. LIPs is a generic term that aims to encompass technologies that involve the creation of a circular DNA molecule, where the probes are designed to hybridize to the target DNA region on either side of a target allele, such that the addition of polymerases and / or appropriate ligases, and from appropriate conditions, buffers and other reagents, will complete the complementary DNA inverted region through the target allele to create a circular DNA loop that captures the information found in the target allele. LIPs can also be called pre-circularized probes, pre-circularized probes or circularized probes. The LIP probe can be a linear DNA molecule between 50 and 500 nucleotides in length, and in some embodiments, between 70 and 100 nucleotides in length; in some modalities, it may be longer or shorter than described here. Other embodiments of the present description involve different incarnations of LIP technology, such as padlock probes and Molecular Inversion Probes (MIPs).
[000234] One method of reaching specific locations for sequencing is to synthesize probes in which the 3 'and 5' ends of the probes pair with the target DNA at adjacent locations and on either side of the target region, in an inverted manner so that the addition of DNA polymerase and DNA ligase results in the extension of the 3 'end, add bases to the single-stranded probe that are complementary to the target molecule (space filling), followed by attachment of the new 3' end to the 5 'end of the original probe resulting in a circular DNA molecule that can subsequently be isolated from background DNA. The probe ends are designed to flank the target region of interest. One aspect of this approach is generally called MIPs and has been used in conjunction with matrix technologies to determine the nature of the filled sequence. A disadvantage of using MIPs in the context of measuring allele relationships is that the steps of hybridization, circularization and amplification did not happen at equal rates for different alleles at the same loci. This results in measured allele ratios that are not representative of the actual allele ratios in the original mixture.
[000235] In one embodiment, the circularization probes are constructed in such a way that the probe region that is designed to hybridize upstream of the target polymorphic locus and the probe region that is designed to hybridize downstream to the target polymorphic locus are covalently connected through a non-nucleic acid structure. This structure can be any biocompatible molecule or combination of biocompatible molecules. Some examples of possible biocompatible molecules are poly (ethylene glycol), polycarbonates, polyurethanes, polyethylenes, polypropylenes, sulfone polymers, silicone, cellulose, fluoropolymers, acrylic compounds, styrene block copolymers, and other block copolymers.
[000236] In one embodiment of the present description, this approach has been modified to be easily adapted to sequencing as a means of interrogating the completed sequence. In order to maintain the original allelic proportions of the original sample, at least one key consideration needs to be taken into account. The variable positions between different alleles in the space-filling region do not need to be very close to the probe binding sites, as there may be initiation bias by DNA polymerase, resulting in different variants. Another consideration is that additional variations may be present in the probe binding sites that are correlated with variants of the space-filling region, which can result in uneven amplification of different alleles. In one embodiment of the present description, the 3 'and 5' ends of the pre-circularized probe are designed to hybridize to the bases that are one or a few positions away from the variant positions (polymorphic sites) of the target allele. The number of bases between the polymorphic site (SNP or others) and the base to which the 3 'or 5' end of the pre-circularized probe is designed to hybridize may be one base, may be two bases, may be three bases, may it can be four bases, it can be five bases, it can be six bases, it can be seven to ten bases, it can be from eleven to fifteen bases, or it can be sixteen to twenty bases, twenty to thirty bases, or thirty to sixty bases. The forward and reverse primers can be designed to hybridize a different number of bases away from the polymorphic site. The circularization probes can be generated in large quantities with the current DNA synthesis technology, allowing very large numbers of probes to be generated and potentially assembled, allowing the interrogation of several loci simultaneously. More than 300,000 probes have been worked on. Two publications that discuss a method involving circularization probes that can be used to measure the genomic data of the target individual include: Porreca et al., Nature Methods, 2007 4 (11), p. 931 to 936, and also Turner et al., Nature Methods, 2009, 6 (5), p. 315 to 316. The methods described in these publications can be used in combination with other methods described here. Certain steps in the method of these two publications can be used in combination with other steps in other methods described here. [000237] In some modalities of the methods described here, the genetic material of the target individual is optionally amplified, followed by hybridization of the pre-circularized probes, making a space filling to fill the bases between the two ends of the hybridized probes, connecting the two ends to form a circular probe, and amplifying the circular probe, using, for example, rolling circle amplification. Once the desired target allelic genetic information is captured through the circularization of properly designed oligonucleic probes, as in the LIP system, the genetic sequence of the circularized probes can be measured to provide the desired sequence data. In one embodiment, the properly designed oligonucleotide probes can be circularized directly into the non-amplified genetic material of the target individual, and amplified later. It is noted that several amplification procedures can be used to amplify the original genetic material, or circularized LIPs, including rolling circle amplification, MDA, or other amplification protocols. Different methods can be used to measure genetic information in the target genome, for example, using high throughput sequencing, Sanger sequencing, other sequencing methods, hybridization capture, circularization capture, multiplex PCR, other hybridization methods, and combinations thereof.
[000238] Once the individual's genetic material has been measured using one or a combination of the above methods, a computer-based method, such as the PARENTAL SUPPORT® method, together with the appropriate genetic measurements, can then be used to determining the ploidy status of one or more chromosomes in the individual, and / or the genetic status of one or a set of alleles, specifically those alleles that are correlated with a disease or genetic state of interest. It is noted that the use of LIPs has been reported for the multiplexed capture of genetic sequences, followed by genotyping with sequencing. However, the use of sequencing data resulting from a LIP-based strategy for the amplification of genetic material found in a single cell, a small number of cells, or extracellular DNA, has not been used for the purpose of determining ploidy status. of a target individual.
[000239] Applying a computer-based method to determine an individual's ploidy status from genetic data measured by hybridization matrices, such as the INFINIUM ILLUMINA matrix, or the AFFYMETRIX gene chip has been described in reference documents elsewhere of this document. However, the method described here shows improvements over the methods previously described in the literature. For example, the LIP-based approach followed by high throughput sequencing unexpectedly provides better genotype data because the approach has better multiplexing capabilities, better capture specificity, better uniformity and low allelic bias. Greater multiplexing allows more alleles to be targeted, providing more accurate results. Better uniformity results in more of the target alleles being measured, providing more accurate results. The lower rates of allelic bias result in lower rates of erroneous determinations, providing more accurate results. The most accurate results result in improved clinical results, and better medical care.
[000240] It is important to note that LIPs can be used as a method to target specific loci in a DNA sample for genotyping by methods other than sequencing. For example, LIPs can be used for target DNA for genotyping using SNP arrays or other microarrays based on DNA or RNA. Link-mediated PCR [000241] Link-mediated PCR is a PCR method used to preferentially enrich a DNA sample by amplifying one or more loci in a DNA mixture, the method comprises: obtaining a set of primer pairs, where each primer in the pair contains a specific target sequence and a non-target sequence, where the specific target sequence is designed to pair with a target region, one upstream and one downstream of the polymorphic site, and which can be separated from the polymorphic site by 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11-20, 21-30, 31-40, 41-50, 51-100, or more than 100; polymerization of DNA from the end 3 line of the upstream primer to fill the single strand region between it and the end 5 line of the downstream primer with the nucleotides complementary to the target molecule; connecting the last polymerized base of the upstream initiator to the base 5 adjacent line of the downstream initiator; and amplifying only polymerized and ligated molecules, using the non-target sequences contained in the upstream 5-line end of the upstream and the downstream 3-line primer in the end. The pairs of primers for different targets can be mixed in the same reaction. The non-target sequences serve as universal sequences such that all primer pairs that have been successfully polymerized and ligated can be amplified with a single pair of amplification primers.
Capture by hybridization [000242] Preferential enrichment of a specific set of sequences in a target genome can be accomplished in several ways. In this document is a description of how LIPs can be used to target a specific set of sequences, but in all of these applications, other methods of preferential enrichment and / or targeting can be used equally well for the same purposes. An example of another targeting method is hybridization capture. Some examples of commercial hybridization capture techniques include AGILENT's SURE SELECT and ILLUMINA's TruSeq. In hybridization capture, a set of oligonucleotides that is complementary or almost complementary to the desired target sequences is allowed to hybridize with a mixture of DNA, and then physically separated from the mixture. Since the desired sequences have hybridized to the targeting oligonucleotides, the effect of physically removing the targeting oligonucleotides is also to remove the target sequences. Once the hybridized oligos are removed, they can be heated above their melting temperature and can be amplified. Some ways to physically remove targeting oligonucleotides is by covalently attaching the targeting oligos to a solid support, for example, a magnetic sphere, or a chip. Another way to physically remove targeting oligonucleotides is by covalently attaching them to a molecular moiety with a strong affinity for another molecular moiety. An example of such a molecular pair is biotin and streptavidin, as used in SURE SELECT. Once the target sequences could be covalently linked to a biotin molecule, and, after hybridization, a solid support with streptavidin attached can be used to pull down the biotinylated oligonucleotides, which are hybridized to the target sequences.
[000243] Hybrid capture involves hybridization probes that are complementary to the targets of interest to the target molecules. Hybrid capture probes were originally developed to target and enrich large fractions of the genome with relative uniformity between targets. In this application, it is important that all targets are amplified with sufficient uniformity so that all regions can be detected by sequencing, however, without considering maintaining the proportion of alleles in the original sample. After capture, the alleles present in the sample can be determined by direct sequencing of the captured molecules. These sequencing readings can be analyzed and counted according to the type of allele. However, using current technology, the allelic distributions measured from the captured sequences are not typically representative of the original allelic distributions.
[000244] In one embodiment, the detection of alleles is done by sequencing. In order to capture the identity of the allele at the polymorphic site, it is essential that the sequencing reading amplifies the allele in question, in order to evaluate the allelic composition of this captured molecule. Since the captured molecules are often of varying lengths by sequencing, it is not possible to guarantee overlapping variant positions unless the entire molecule is sequenced. However, cost considerations, as well as technical limitations regarding the maximum possible length and accuracy of sequencing liters make sequencing of the entire molecule unfeasible. In one embodiment, the reading length can be increased from approximately 30 to approximately 50 or approximately 70 bases, which can greatly increase the number of readings that overlap the variant positions within the target sequences.
[000245] Another way to increase the number of readings that interrogate the position of interest is to decrease the length of the probe, as long as it does not result in bias in the underlying enriched alleles. The length of the synthesized probe should be long enough that two probes designed to hybridize with two different alleles found in one locus will hybridize with almost equal affinity to the various alleles in the original sample. Currently, methods known in the art describe probes that are typically longer than 120 bases. In a current embodiment, if the allele is one or a few bases then the capture probes can be less than approximately 110 bases, less than approximately 100 bases, less than approximately 90 bases, less than approximately 80 bases, less than than approximately 70 bases, less than approximately 60 bases, less than approximately 50 bases, less than approximately 40 bases, less than approximately 30 bases, and less than approximately 25 bases, and this is sufficient to ensure equal enrichment of alleles. When the DNA mixture that will be enriched using hybrid capture technology is a mixture comprising free DNA isolated from blood, for example, maternal blood, the average length of the DNA is very short, typically less than 200 bases. The use of shorter probes results in a greater likelihood that hybrid capture probes will capture desired DNA fragments. Larger variations may require longer probes. In one embodiment, the variations of interest are from one (a SNP) to a few bases in length. In one embodiment, the target regions in the genome can preferably be enriched using hybrid capture probes of less than 90 bases in length, and can be less than 80 bases, less than 70 bases, less than 60 bases, less 50 bases, less than 40 bases, less than 30 bases, or less than 25 bases. In one embodiment, to increase the likelihood that the desired allele will be sequenced, the length of the probe that is designed to hybridize to the regions that flank the polymorphic location of the allele can be decreased from more than 90 bases, to approximately 80 bases, or approximately 70 bases, or approximately 60 bases, or approximately 50 bases, or approximately 40 bases, or approximately 30 bases, or approximately 25 bases.
[000246] There is minimal overlap between the synthesized probe and the target molecule, in order to allow capture. This synthesized probe can be made as short as possible, although being larger than this minimum required overlap. The effect of using a shorter probe length to target a polymorphic region is that there will be more molecules that overlap the target allele region. The fragmentation state of the original DNA molecules also affects the number of readings that overlap the target alleles. Some DNA samples, such as plasma samples, are already fragmented due to biological processes that occur in vivo. However, samples with larger fragments benefit from fragmentation before preparing the sequencing and enrichment library. When both probes and fragments are short (~ 60-80 bp), maximum specificity can be achieved, with relatively few sequence readings failing to overlap the crucial region of interest.
[000247] In one embodiment, the hybridization conditions can be adjusted to maximize uniformity in the capture of different alleles present in the original sample. In one embodiment, the hybridization temperatures are lowered to minimize differences in the hybridization bias between the alleles. Methods known in the art avoid the use of lower temperatures for hybridization, because lowering the temperature has the effect of increasing probe hybridization for unintended targets. However, when the goal is to preserve allele relationships with maximum fidelity, the approach of using lower hybridization temperatures provides optimally accurate allele relationships, despite the fact that the current technique deviates from this approach. The hybridization temperature can also be increased to require greater overlap between the target and the synthesized probe so that only targets with substantial overlap in the target region are captured. In some embodiments of the present description, the hybridization temperature is reduced from the normal hybridization temperature to approximately 40 ° C, to approximately 45 ° C, to approximately 50 ° C, to approximately 55 ° C, to approximately 60 ° C, to approximately 65, or to approximately 70 ° C.
[000248] In one embodiment, the hybrid capture probes can be designed so that the region of the capture probe with DNA that is complementary to the DNA found in the regions flanking the polymorphic allele is not immediately adjacent to the polymorphic site. Instead, the capture probe can be designed so that the capture probe region that is designed to hybridize with DNA flanking the target's polymorphic site is separated from the part of the capture probe that will be in contact with van der Waals with the polymorphic site for a short distance, which is equivalent to the length of one or a small number of bases. In one embodiment, the hybrid capture probe is designed to hybridize to a region that is flanking the polymorphic allele, but does not cross it; what can be called a flank capture probe. The length of the flanking capture probe may be less than approximately 120 bases, less than approximately 110 bases, less than approximately 100 bases, less than approximately 90 bases, and may be less than approximately 80 bases, less than approximately 70 bases, less than approximately 60 bases, less than approximately 50 bases, less than approximately 40 bases, less than approximately 30 bases, or less than approximately 25 bases. The region of the genome that is targeted by the flanking capture probe can be separated by the polymorphic locus by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11-20, or more than 20 base pairs .
[000249] Description of a disease screening test based on targeted capture using target sequence capture. Capture of standard target sequences, such as those currently offered by AGILENT (SURE SELECT), ROCHE-NIMBLEGEN or ILLUMINA. Capture probes could be designed to ensure the capture of various types of mutations. For point mutations, one or more probes that overlap the point mutation should be sufficient to capture and sequence the mutation.
[000250] For small insertions or deletions, one or more probes that overlap the mutation may be sufficient to capture and sequence sequence fragments comprising the mutation. Hybridization may be less effective between the limit capture efficiency of the probe, typically designed for the reference genome sequence. To guarantee the capture of fragments comprising the mutation, two probes could be designed, one that corresponds to the normal allele and one that corresponds to the mutant allele. The longer probe can improve hybridization. Multiple overlapping probes can improve capture. Finally, the placement of a probe immediately adjacent, but not overlapping, the mutation may allow for a capture efficiency relatively similar to that of normal and mutant alleles.
[000251] For repeated short tandem sequences (STRs), a probe overlapping these highly variable sites is unlikely to capture the fragment well. To improve capture, a probe could be placed adjacent, but not overlapping the variable site. The fragment could then be sequenced as normal to reveal the length and composition of the STR.
[000252] For large deletions, a series of overlapping probes, a common approach currently used in exome capture systems may work. However, with this approach, it can be difficult to determine whether an individual is heterozygous or not. Targeting and evaluating SNPs within the captured region could potentially reveal loss of heterozygosity across the region, indicating that an individual is a vehicle. In one embodiment, it is possible to locate non-overlapping or singleton probes across the potentially deleted region and use the number of captured fragments as a measure of heterozygosity. In the case where the individual carries a large deletion, it is expected that half the number of fragments will be available for capture in relation to an undeleted reference locus (diploid). Consequently, the number of readings obtained from the deleted regions should be approximately half that obtained from a normal diploid locus. Aggregating and averaging the sequencing reading depth from multiple singleton probes across the potentially deleted region can intensify the signal and improve diagnostic confidence. The two approaches, targeting SNPs to identify loss of heterozygosity and using several singleton probes to obtain a quantitative measure of the amount of underlying fragments from that locus, can also be combined. Each or both of these strategies can be combined with other strategies to better achieve the same purpose.
[000253] If during the cfDNA detection test of a male fetus, as indicated by the presence of fragments of the Y chromosome, captured and sequenced in the same test, and or an X-linked dominant mutation, where the father and mother are not affected, or a dominant mutation where the mother is not affected would indicate risk to the fetus. The detection of two mutant recessive alleles within the same gene in an unaffected mother would imply that the fetus had inherited a mutant allele from the father and potentially a second mutant allele from the mother. In all cases, the follow-up test for amniocentesis or chorionic villus biopsy may be indicated.
[000254] A disease screening test based on targeted capture could be combined with a non-invasive prenatal diagnostic test based on targeted capture for aneuploidy.
[000255] There are a number of ways to decrease the variability of the reading depth (DOR): for example, one could increase the concentrations of initiator, one could use longer directed amplification probes, or one could one would perform more STA cycles (such as more than 25, more than 30, more than 35, or even more than 40). Targeted PCR [000256] In some embodiments, PCR can be used to target specific locations in the genome. In plasma samples, the original DNA is highly fragmented (typically less than 500 bp, with an average length of less than 200 bp). In PCR, both forward and reverse primers need to pair with the same fragment to allow amplification. Therefore, if the fragments are short, the PCR assays need to amplify relatively short regions as well. Like MIPS, if the polymorphic positions are very close to the polymerase binding site, it could result in biases in the amplification of different alleles. Currently, PCR primers that target the target polymorphic regions, such as those containing SNPs, are typically designed in such a way that the 3 'end of the primer will hybridize to the base immediately adjacent to the base or polymorphic bases. In one embodiment of the present description, the 3 'ends of both forward and reverse PCR primers are designed to hybridize to the bases that are one or a few positions away from the variant positions (polymorphic sites) of the target allele. The number of bases between the polymorphic site (SNP or other) and the base on which the 3 'end of the primer is designed to hybridize can be one base, it can be two bases, it can be three bases, it can be four bases, it can be five bases, it can be six bases, it can be seven to ten bases, it can be eleven to fifteen bases, or it can be sixteen to twenty bases. The forward and reverse primers can be designed to hybridize a different number of bases away from the polymorphic site.
[000257] The PCR assay can be generated in large numbers, however, the interactions between the different PCR assays make it difficult to multiplex them beyond approximately one hundred assays. Several complex molecular approaches can be used to increase the level of multiplexing, but can still be limited to less than 100, perhaps 200, or possibly 500 assays per reaction. Samples with large amounts of DNA can be divided between multiple subreations and then recombined before sequencing. For samples where either the general sample or some subpopulation of DNA molecules is limited, splitting the sample would introduce some statistical noise. In one embodiment, a small or limited amount of DNA can refer to less than 10 pg, between 10 and 100 pg, between 100 pg and 1 ng, between 1 and 10 ng, or between 10 and 100 ng. It is noted that, while this method is particularly useful in small amounts of DNA, where other methods that involve division into several groups can cause significant problems related to the introduced stochastic noise, this method still provides the benefit of minimizing bias when it is performed in samples of any amount of DNA. In these situations, a universal pre-amplification step can be used to increase the overall sample amount. Ideally, this pre-amplification step should not significantly alter allele distributions.
[000258] In one embodiment, a method of the present description can generate PCR products that are specific to a large number of target loci, specifically 1,000 to 5,000 loci, 5,000 to 10,000 or more than 10,000 loci, for sequencing or some genotyping another genotyping method, from limited samples such as single cells or DNA from body fluids. Currently, running multiplex PCR reactions on more than 5 to 10 targets presents a major challenge and is often hampered by primer side products, such as primer dimers, and other artifacts. When detecting target sequences using microarrays with hybridization probes, primer dimers and other artifacts can be ignored, as they are not detected. However, when sequencing is used as a detection method, the vast majority of sequencing readings would sequence such artifacts and not the desired target sequences in a sample. The methods described in the prior art used to multiplex more than 50 or 100 reactions in one reaction followed by sequencing will typically result in more than 20%, and often more than 50%, in many cases, more than 80% and, in some cases, over 90% off-target sequence readings.
[000259] In general, to perform target sequencing of multiple (n) targets in a sample (greater than 50, greater than 100, greater than 500, or greater than 1,000), one can divide the sample into one number of side reactions that amplify an individual target. This was performed on plates of several PCR wells or can be done on commercial platforms such as FLUIDIGM ACCESS ARRAY (48 reactions per sample on microfluidic chips) or DROPLET PCR by RAIN DANCE TCHNOLOGY (hundreds to a few thousand targets). Unfortunately, these methods of dividing and grouping are problematic for samples with a limited amount of DNA, as there are not enough copies of the genome to ensure that there is a copy of each region of the genome in each well. This is an especially serious problem when polymorphic loci are targeted, and the relative proportions of the alleles at the polymorphic loci are necessary, as the stochastic noise introduced by the division and cluster will cause inaccurate measurements of the proportions of the alleles that were present in the original sample of DNA. Described here is a method for effectively and effectively amplifying many PCR reactions that is applicable to cases where only a limited amount of DNA is available. In one embodiment, the method can be applied for the analysis of single cells, body fluids, mixtures of DNA, such as the free DNA found in maternal plasma, biopsies, environmental and / or forensic samples.
[000260] In one embodiment, targeted sequencing can involve one, several, or all of the following steps: (a) Generate and expand a library with adapter sequences at both ends of DNA fragments. (b) Split into multiple reactions after amplifying the library. (c) Generate and, optionally, amplify a library with adapter sequences at both ends of the DNA fragments. (d) Perform 1000- to 10,000-plex amplification of selected targets using a target specific "direct" primer and a marker specific primer. (e) Perform a second amplification of that product using specific "reverse" target primers and one (or more) specific primer for a universal marker that was introduced as part of the target specific direct primers in the first step. (f) Perform a 1000-plex pre-amplification of selected target for a limited number of cycles. (g) Divide the product into multiple aliquots and amplify subgroups of targets in individual reactions (for example, 50 to 500-plex, although this can be used for singleplex). (h) Group reaction products from parallel subgroups. (i) During these amplifications, primers can load compatible sequencing markers (full or partial length) in such a way that the products can be sequenced. Highly multiplexed PCR [000261] Methods are described here that allow targeted amplification of more than a hundred to tens of thousands of target sequences (eg, SNP loci) from genomic DNA obtained from plasma. The amplified sample can be relatively free of primer dimer products and have low allelic bias at the target loci. If during or after amplification, products are attached with compatible sequencing adapters, analysis of these products can be performed by sequencing.
[000262] Performing a highly multiplexed PCR amplification using methods known in the art results in the generation of initiator dimer products that have in excess the desired amplification products and are not suitable for sequencing. These can be reduced empirically by eliminating initiators that form these products, or by performing in silico selection of initiators. However, the greater the number of trials, the more difficult this problem becomes.
[000263] One solution is to divide the 5000-plex reaction into several lower plex magnifications, for example, one hundred 50-plex reactions or fifty 100-plex reactions, either to use microfluidics or even to split the sample into PCR reactions individual. However, if the DNA sample is limited, as in non-invasive prenatal plasma diagnoses in pregnancy, splitting the sample between multiple reactions should be avoided as this will result in bottlenecks.
[000264] Here are described methods for first globally amplifying the plasma DNA of a sample and then dividing the sample into multiple target enrichment reactions multiplexed with more moderate numbers of target sequences per reaction. In one embodiment, a method of the present description can be used to preferentially enrich a mixture of DNA to a plurality of loci, the method comprises one or more of the following steps: generating and amplifying a library from a mixture of DNA, where the molecules in the library have adapter sequences attached at both ends of the DNA fragments, divide the amplified library into several reactions, perform a first step of multiplex amplification of selected targets using a specific "direct" primer and one or more "reverse" primers "adapter-specific universals. In one embodiment, a method of the present description further includes performing a second amplification using target specific "reverse" primers and one or more specific primers for a universal marker that was introduced as part of the target specific forward primers in the first step. In one embodiment, the method may involve a PCR approach that is completely nested, hemi-nested, semi-nested, completely nested, unilateral, or semi-nested. In one embodiment, a method of the present description is used to preferentially enrich a mixture of DNA at various loci, the method comprising performing a multiplex pre-amplification of selected targets for a limited number of cycles, dividing the product into multiple aliquots and amplifying subgroups targets in individual reactions, and group reaction products from parallel subgroups. Note that this approach can be used to perform specific amplification in a way that would result in low levels of allelic bias for 50-500 loci, for 500 to 5000 loci, for 5,000 to 50,000 loci, or even for 50,000 to 500,000 loci . In one embodiment, the initiators carry compatible full or partial length sequencing markers.
[000265] The workflow can check (1) extract DNA from plasma, (2) prepare fragment library with universal adapters at both ends of the fragments, (3) amplify the library using adapter specific universal primers (4), dividing the amplified sample "library" into several aliquots, (5) performing multiplex amplifications (for example, approximately 100-plex, 1,000 or 10,000-plex with a specific target primer per target and a specific marker primer) in aliquots, ( 6) group the aliquots of a sample, (7) barcode the sample (8), mix the samples and adjust the concentration, (9) sequence the sample. The workflow can comprise several substeps that contain one of the listed steps (for example, step (2) of preparing the library step could confer three enzymatic steps (blunt end, tail dA and adapter connection) and three purification steps) . Workflow steps can be combined, divided or performed in different orders (for example, barcode and sample grouping).
[000266] It is important to note that the amplification of a library can be performed in such a way that it is inclined to amplify short fragments more effectively. In this way, it is possible to preferentially amplify shorter sequences, for example, the fragments of mononucleosomal DNA such as free fetal DNA (of placental origin) found in the circulation of pregnant women. Note that PCR assays can have markers, for example, sequencing markers (usually a truncated 15-25 base form). After multiplexing, PCR multiplexes from a sample are grouped together, and then the markers are completed (including barcode) by a specific marker PCR (could also be done by connection). In addition, complete sequencing markers can be added in the same reaction as multiplexing. In the first cycles, the targets can be amplified with the specific target primers, subsequently, the specific marker primers accumulate to complete the SQ-adapter sequence. PCR primers may not load markers. Sequencing markers can be attached to ligation amplification products.
[000267] In one embodiment, highly multiplex PCR followed by evaluation of the amplified material by clonal sequencing can be used to detect fetal aneuploidy. Considering that traditional multiplex PCRs evaluate up to fifty loci simultaneously, the approach described here can be used to allow the evaluation of more than 50 loci simultaneously, more than 100 loci simultaneously, more than 500 loci simultaneously, more than 1,000 loci simultaneously , more than 5,000 loci simultaneously, more than 10,000 loco simultaneously, more than 50,000 loco simultaneously, and more than 100,000 loco simultaneously. Experiments have shown that up to and including more than 10,000 distinct loci can be evaluated simultaneously, in a single reaction, with sufficiently good efficacy and specificity to make non-invasive prenatal diagnoses of aneuploidy and / or high copy number determinations precision. The assays can be combined in a single reaction with an entire cfDNA sample isolated from maternal plasma, a fraction thereof, or an additional derivative of the cfDNA sample. The cfDNA or derivative can also be divided into several parallel multiplex reactions. The optimal split and multiplex of the sample is determined by compensating for various performance specifications. Due to the limited amount of material, dividing the sample into several fractions can introduce sampling noise, handling time, and increase the possibility of error. On the other hand, greater multiplexing can result in greater amounts of spurious amplification and greater inequalities in amplification, both of which can reduce test performance. [000268] Two crucial considerations related to the application of the methods described in this document are the limited amount of original plasma and the number of original molecules in that material from which the allelic frequency or other measurements are obtained. If the number of original molecules falls below a certain level, random sampling noise becomes significant, and can affect the accuracy of the test. Typically, data of sufficient quality to make non-invasive prenatal diagnoses of aneuploidy can be obtained if measurements are made on a sample comprising the equivalent of 500-1000 original molecules per target locus. There are a number of ways to increase the number of different measures, for example, by increasing the sample volume. Each manipulation applied to the sample also potentially results in material losses. It is essential to characterize losses suffered by various manipulations and to avoid, or, if necessary, improve the performance of certain manipulations to avoid losses that could degrade test performance.
[000269] In one embodiment, it is possible to smooth out potential losses in subsequent steps by amplifying all or part of the original dfDNA sample. Several methods are available to amplify all the genetic material in a sample, increasing the amount available for downstream procedures. In one embodiment, in ligation-mediated PCR (LM-PCR), DNA fragments are amplified by PCR after ligation of any of the different adapters, two different adapters, or many different adapters. In one embodiment, phi-29 multi-displacement amplification polymerase (MDA) is used to amplify all DNA isothermally. In DOP-PCR and variations, random initiation is used to amplify the DNA of the original material. Each method has certain characteristics, such as the uniformity of amplification across all represented regions of the genome, the efficiency of capture and amplification of original DNA, and the amplification performance as a function of fragment length.
[000270] In one embodiment, LM-PCR can be used with a single adapter with heteroduplexes having a 3 'tyrosine. The heteroduplex adapter allows the use of a single adapter molecule that can be converted into two distinct sequences at the 5 'and 3' ends of the original DNA fragment, during the first PCR step. In one embodiment, it is possible to fractionate the amplified library by separations in size, or products such as AMPURE, TASS or other similar methods. Prior to ligation, the sample DNA may be blunt ends, and then a single base of adenosine is added to the 3 'end. Before ligation, DNA can be cleaved using a restriction enzyme or some other method of cleavage. During ligation, the 3 'adenosine of the sample fragments and the complementary 3' tyrosine secondary chain of the adapter can enhance the effectiveness of the ligation. The step of extending the PCR amplification can be limited from a time point of view to reduce the amplification of fragments longer than approximately 200 bp, approximately 300 bp, approximately 400 bp, approximately 500 bp and approximately 1,000 bp. Since the longest DNA found in maternal plasma is almost exclusively maternal, this can result in enrichment of the fetal DNA by 10-50%, and improve test performance. Several reactions were performed using conditions specified by the commercially available kits, which resulted in successful binding of less than 10% of the sample's DNA molecules. A series of optimized reaction conditions for this connection has been improved to approximately 70%.
Mini-PCR
[000271] The traditional PCR assay results in significant losses of distinct fetal molecules, but the losses can be greatly reduced by designing very short PCR assays, called mini-PCR assays. The fetal cfDNA in the maternal serum is highly fragmented and the fragment sizes are distributed in approximately a Gaussian form with an average of 160 bp, a standard deviation of 15 bp, a minimum size of approximately 100 bp, and a maximum size of approximately 220 bp. The distribution of the starting and ending positions of the fragments in relation to the targeted polymorphisms, although not necessarily random, varies widely between individual targets and between all targets collectively and the polymorphic site of a particular target locus can occupy any position from start to finish between the various fragments originating from that locus. It is noted that the so-called mini-PCR can refer equally well to normal PCR, without additional restrictions or limitations.
[000272] During PCR, amplification will only occur from template DNA fragments comprising forward and reverse primer sites. As the fetal cfDNA fragments are short, the probability of both primer sites being present, the probability of a fetal fragment of length L comprising both the forward and reverse primer sites is the ratio of the length of the amplicon to the length of the fragment. Under ideal conditions, assays in which the amplicon is 45, 50, 55, 60, 65, or 70 bp will successfully amplify from 72%, 69%, 66%, 63%, 59%, or 56%, respectively , of available template fragment molecules. The length of the amplicon is the distance between the 5 'ends of the forward and reverse primer sites. The length of the amplicon that is shorter than normally used by those skilled in the art can result in more effective measurements of the desired polymorphic loci requiring only short sequence readings. In one embodiment, a substantial fraction of the amplicons should be less than 100 bp, less than 90 bp, less than 80 bp, less than 70 bp, less than 65 bp, less than 60 bp, less than 55 bp, less than 50 bp or less than 45 bp.
[000273] Note that in the methods known in the prior art, short tests, such as those described here, are generally avoided, because they are not necessary and impose considerable restriction on the initiator design by limiting the length of the initiator, pairing characteristics , and the distance between the forward and reverse initiators.
[000274] It is also noted that the potential for skewed amplification exists if the 3 'end of any primer is within approximately 1-6 bases of the polymorphic site. This single base difference in the initial polymerase binding site can result in preferential amplification of an allele, which can alter the observed allele frequencies and degrade performance. All of these restrictions make it very challenging to identify primers that successfully amplify a particular locus and, in addition, design large sets of primers that are compatible in the same multiplex reaction. In one embodiment, the 3 'end of the forward and reverse primers is designed to hybridize to a region of DNA upstream of the polymorphic site, and separated from the polymorphic site by a small number of bases. Ideally, the number of bases can be between 6 and 10 bases, but it can also be between 4 and 15 bases, between three and 20 bases, between two and 30 bases, or between 1 and 60 bases, and achieve substantially the same result.
[000275] Multiplex PCR can involve a single PCR step in which all targets are amplified, or it can involve a PCR step followed by one or more nested PCR steps or some nested PCR variant. Nested PCR consists of a step or subsequent steps of PCR amplification using one or more new primers that bind internally, for at least a base pair, to the primers used in the previous step. Nested PCR reduces the number of spurious amplification targets per amplification, in subsequent reactions, only those amplification products from the previous one that have the correct internal sequence. Reducing spurious amplification targets increases the number of useful measurements that can be obtained, especially in sequencing. Nested PCR typically means to design primers completely internal to the binding sites of previous primers, necessarily increasing the size of the minimum DNA segment required for amplification. For samples such as maternal plasma cfDNA, where DNA is highly fragmented, the larger size of the assay reduces the number of distinct cfDNA molecules from which a measurement can be obtained. In one embodiment, to compensate for this effect, a partial nesting approach can be used in which one or both of the initiators of the second step overlap the first binding sites that internally extend a certain number of bases to achieve additional specificity while minimally increasing in total test size.
[000276] In one embodiment, a multiplex cluster of assays is designed to amplify potentially heterozygous SNP or other polymorphic or non-polymorphic loci on one or more chromosomes and these assays are used in a single reaction to amplify DNA. The number of PCR runs can be between 50 and 200 PCR runs, between 200 and 1,000 PCR runs, between 1,000 and 5,000 PCR runs, or between 5,000 and 20,000 PCR runs (50 to 200-plex, 200 to 1,000 -plex, 1,000 to 5,000-plex, 5,000 to 20,000-plex, more than 20,000-plex, respectively). In one embodiment, a multiplex cluster of approximately 10,000 PCR assays (10,000-plex) is designed to amplify potentially heterozygous SNP loci on X, Y, 13, 18, and 21, and 1 or 2 chromosomes and these assays are used in a reaction unique to amplify cfDNA obtained from a plasma sample of the material, chorionic villus samples, amniocentesis samples, single cells or a small number of cells, other body fluids or tissues, cancers, or other genetic material. The SNP frequencies of each locus can be determined by clonal method or some other method of sequencing the amplicons. Statistical analysis of allele frequency distributions or proportions for all assays can be used to determine whether the sample contains a trisomy of one or more of the chromosomes included in the test. In another embodiment, the original cfDNA samples are divided into two samples and parallel 5000-plex parallel assays are performed. In another modality, the original cfDNA samples are divided into n samples and parallel assays (~ 10,000 / n-plex) are performed where n is between 2 and 12, or between 12 and 24, or between 24 and 48, or between 48 and 96. Data are collected and analyzed in a similar way to what has already been described. Note that this method is equally applicable to detect translocations, deletions, duplications, and other chromosomal abnormalities.
[000277] In one embodiment, tails without homology to the target genome can also be added to the 3 'or 5' end of any of the primers. These tails facilitate subsequent manipulations, procedures or measurements. In one embodiment, the tail sequence can be the same for specific forward and reverse target primers. In one embodiment, different tails can be used for specific forward and reverse target primers. In one embodiment, several different tails can be used for different loci or sets of loci. Certain tails can be shared between all loci or between subsets of loci. For example, using forward and reverse tails corresponding to forward and reverse sequences required by any of the current sequencing platforms can allow direct sequencing after amplification. In one embodiment, tails can be used as common initiation sites among all amplified targets that can be used to add other useful sequences. In some embodiments, the internal primers may contain a region that is designed to hybridize upstream or downstream of the target polymorphic locus. In some embodiments, the primers may contain a molecular bar code. In some embodiments, the primer may contain a universal initiation sequence designed to allow PCR amplification.
[000278] In one embodiment, a group of 10,000-plex PCR assays is created so that forward and reverse primers have tails corresponding to the forward and reverse sequences required by a high-throughput sequencing instrument, such as HISEQ, GAIIX, or MYSEQ available from ILLUMINA. In addition, included 5 'to the sequencing tails is an additional sequence that can be used as an initiation site in a subsequent PCR to add nucleotide barcode sequences to the amplicons, allowing multiplex sequencing of multiple samples in a single high-performance sequencing instrument range.
[000279] In one embodiment, a group of 10,000-plex PCR assays is created in such a way that the reverse primers have tails corresponding to the reverse sequences required by a high-throughput sequencing instrument. After amplification with the first 10,000-plex assay, subsequent PCR amplification can be performed using another group of 10,000 plex having partially nested forward primers (eg 6-nested bases) for all targets and a reverse primer corresponding to reverse sequencing tail included in the first step. This subsequent stage of amplification partially nested with only one specific target primer and one universal primer limits the required size of the assay, reducing sampling noise, but greatly reducing the number of spurious amplicons. Sequencing markers can be added to attached link adapters and / or as part of PCR probes, such that the marker is part of the final amplicon.
[000280] The fetal fraction affects test performance. There are a number of ways to enrich the fetal fraction of DNA found in maternal plasma. The fetal fraction can be increased by the LM-PCR method described previously, as well as by a targeted removal of long maternal fragments. In one embodiment, prior to multiplex PCR amplification of the target loci, an additional multiplex PCR reaction can be performed to selectively remove the longest and widest maternal fragments corresponding to the target loci in the subsequent multiplex PCR. The additional primers are designed to pair a site at a greater distance from the polymorphism than is expected to be present between fragments of free fetal DNA. These primers can be used in a one-cycle multiplex PCR reaction prior to the multiplex PCR of the target polymorphic loci. These distal primers are labeled with a molecule or portion that can allow selective recognition of the labeled parts of DNA. In one embodiment, these DNA molecules can be covalently modified with a biotin molecule, which allows for the removal of newly formed double-stranded DNA comprising these primers after a PCR cycle. Double-stranded DNA formed during this first stage is probably of maternal origin. The removal of the hybrid material can be carried out using magnetic streptavidin beads. There are other dialing methods that can work just as well. In one embodiment, size selection methods can be used to enrich the sample for other DNA strands; for example, those less than approximately 800 bp, less than approximately 500 bp, or less than approximately 300 bp. Amplification of short fragments can then proceed as usual.
[000281] The mini-PCR method described in this description allows highly multiplexed amplification and the analysis of hundreds to thousands or even millions of loci in a single reaction, from a single sample. At the same time, the detection of amplified DNA can be multiplexed; tens to hundreds of samples can be multiplexed on a sequencing track using PCR barcode. This multiplexed detection has been successfully tested up to 49-plex, and a much higher degree of multiplexing is possible. In effect, this allows hundreds of samples to be genotyped across thousands of SNPs in a single sequencing run. For these samples, the method allows the determination of genotype and heterozygosity rate and, simultaneously, the determination of the number of copies, both of which can be used for the purpose of detecting aneuploidy. This method is particularly useful in detecting aneuploidy of a fetus in gestation from the free DNA found in maternal plasma. This method can be used as part of a method to determine the sex of a fetus and / or to predict the fetal paternity. It can be used as part of a method for measuring mutation. This method can be used for any amount of DNA or RNA, and the target regions can be SNPs, other polymorphic regions, non-polymorphic regions, and combinations thereof.
[000282] In some embodiments, universal PCR amplification mediated by fragmented DNA binding can be used. Link-mediated universal PCR amplification can be used to amplify plasma DNA, which can then be divided into several parallel reactions. It can also preferably be used to amplify short fragments, thus enriching the fetal fraction. In some embodiments, the addition of markers to the binding fragments may allow the detection of shorter fragments, the use of specific parts of the shorter primer target sequence and / or pairing at higher temperatures, which reduces non-specific reactions.
[000283] The methods described here can be used for a number of purposes where there is a target set of DNA that is mixed with a quantity of contaminating DNA. In some embodiments, the target DNA and the contaminating DNA may be from individuals who are genetically related. For example, genetic abnormalities in a fetus (target) can be detected from maternal plasma, which contains fetal DNA (target) and also maternal DNA (contaminant); anomalies include integral chromosomal anomalies (eg, aneuploidy), partial chromosomal anomalies (eg, deletions, duplications, inversions, translocations), polynucleotide polymorphisms (eg, STRs), single nucleotide polymorphisms, and / or other anomalies or genetic differences. In some embodiments, the target DNA and the contaminating DNA may be from the same individual, but where the target DNA and the contaminating DNA are different by one or more mutations, for example, in the case of cancer. (See, for example, H. Mamon et al., Preferential Amplification of Apoptotic DNA from Plasma: Potential for Enhancing Detection of Minor DNA Alterations in Circulating DNA. Clinical Chemistry 54: 9 (2008). In some embodiments, DNA can be found in cell culture supernatant (apoptotic) .In some embodiments, it is possible to induce apoptosis in biological samples (eg blood) for subsequent library preparation, amplification and / or sequencing. Various workflows and protocols to achieve this order are presented in this description.
[000284] In some embodiments, the target DNA may originate from single cells, from DNA samples consisting of less than one copy of the target genome, from small amounts of DNA, from DNA of mixed origin ( eg pregnancy plasma: placental and maternal DNA; cancer patient and tumor plasma: mix between healthy and cancer DNA, transplantation, etc.), from other body fluids, from cell cultures, from culture supernatants, from forensic DNA samples, from old DNA samples (for example, insects trapped in amber), from other DNA samples, and combinations thereof.
[000285] In some embodiments, a short-sized amplicon can be used. Short-sized amplicons are especially suitable for fragmented DNA (see, for example, A. Sikora, et al., Detection of increased amounts of cell-free fetal DNA with short PCR amplicons. Clin Chem. 2010 Jan; 56 (1): 136-8).
[000286] The use of short sized amplicons can result in some significant benefits. Short sized amplicons can result in optimized amplification effectiveness. Short sized amplicons typically produce shorter products, so there is less chance of non-specific initiation. Shorter products can be clustered more densely in flow cell sequencing, as clusters will be smaller. Note that the methods described here can work equally well for longer PCR amplicons. The length of the amplicon can be increased if necessary, for example, when sequencing stretches of larger sequences. Experiments with directed 146-plex amplification with 100 bp and 200 bp long assays as the first step in a nested PCR protocol were performed on single cells and genomic DNA, with positive results.
[000287] In some embodiments, the methods described here can be used to amplify and / or detect SNPs, copy number, nucleotide methylation, mRNA levels, other types of RNA expression levels, other genetic and / or epigenetic characteristics . The mini-PCR methods described here can be used in conjunction with next generation sequencing; they can be used with other downstream methods, such as microarrays, digital PCR counting, real-time PCR, mass spectrometry analysis, etc. [000288] In some embodiments, the mini-PCR amplification methods described here can be used as part of a method for the accurate quantification of minority populations. They can be used for absolute quantification using addition calibrators. They can be used to mutate / quantify secondary alleles through very deep sequencing, and can be performed in a highly multiplexed manner. They can be used for standard paternity tests and the identity of relatives or ancestors on humans, animals, plants or other creatures. They can be used for forensic testing. They can be used for rapid genotyping and copy number (NC) analysis on any type of material, for example, amniotic fluid and CVS, sperm, product of conception (POC). They can be used for the analysis of a single cell, such as genotyping in samples submitted to embryo biopsy. They can be used for rapid embryo analysis (in less than one, one or two days of biopsy) by targeted sequencing using mini-PCR.
[000289] In some embodiments, they can be used for tumor analysis: tumor biopsies are often a mixture of tumor and healthy cells. Targeted PCR allows deep sequencing of SNPs and loci with almost no background sequence. It can be used for the number of copies and the analysis of loss of heterozygosity in tumor DNA. Said tumor DNA can be present in many body fluids or tissues other than tumor patients. It can be used for the detection of tumor recurrence, and / or tumor screening. It can be used for seed quality control testing. It can be used for farming or fishing purposes. Note that any of these methods can be used equally well aiming at non-polymorphic loci for the purpose of determining ploidy.
[000290] Some literature describing some of the fundamental methods underlying the methods described here includes: (1) Wang HY, Luo M, Tereshchenko IV, Frikker DM, Cui X, Li JY, Hu G, Chu Y, Azaro MA, Lin Y , Shen L, Yang Q, Kambouris ME, Gao R, Shih W, Li H.
Genome Res. February 2005; 15 (2): 276-83. Department of Molecular Genetics, Microbiology and Immunology / The Cancer Institute of New Jersey, Robert Wood Johnson Medical School, New Brunswick, New Jersey 08903, USA. (2) High-throughput genotyping of single nucleotide polymorphisms with high sensitivity. Li H, Wang HY, Cui X, Luo M, Hu G, Greenawalt DM, Tereshchenko IV, Li JY, Chu Y, Gao R. Methods Mol Biol. 2007; 396 - PubMed PMID: 18025699. (3) A method comprising multiplexing of an average of 9 assays for sequencing is described in: Nested Patch PCR enables highly multiplexed mutation discovery in candidate genes. Varley KE, Mitra RD. Genome Res. 2008 Nov; 18 (11): 1844-50. Epub October 10, 2008. Note that the methods described here allow for the multiplexing of more orders of magnitude than in the references above.
Primer Design [000291] Highly multiplexed PCR can often result in the production of a very high proportion of product DNA that results from non-productive side reactions such as primer dimer formation. In one embodiment, the particular primers that are most likely to cause non-productive side reactions can be removed from the primer library to obtain a primer library that will result in a higher proportion of amplified DNA that maps to the genome. The step of removing problematic primers, that is, those primers that are particularly likely to form dimers, unexpectedly enabled extremely high levels of PCR for subsequent sequencing analysis. In systems such as sequencing, where performance degrades significantly by primer dimers and / or other damaged products, more than 10, more than 50, more than 100 times more multiplexing than another described multiplexing has been achieved. Note that this is opposite to probe-based detection methods, for example, microarrays, TaqMan, PCR, etc. where an excess of primer dimers will not affect the resulting appreciability. It is also noted that the general belief in the technique is that multiplex PCR for sequencing is limited to running 48 to 1000s of PCR assays in parallel reactions for a sample.
[000292] There are a number of ways to choose primers for a library in which the amount of primer non-mapping, or other damaged primer products is minimized. Empirical data indicates that a small number of 'bad' primers is responsible for a large number of non-mapping primer side reactions. Removing these 'bad' primers can increase the percentage of sequence readings that map to the target loci. One way to identify the 'bad' primers is to consider the DNA sequencing data that has been amplified by targeted amplification; those primer dimers that are seen most often can be removed to provide a library of primers that is significantly less likely to result in by-product DNA than not mapped the genome. There are also publicly available programs that can calculate the binding energy of various combinations of primers, and removing those with the highest binding energy will also provide a library of primers that is significantly less likely to result in by-product DNA that maps the genome.
[000293] Multiplexing large numbers of primers imposes significant restrictions on the tests that can be included. Tests that interact involuntarily result in spurious amplification products. Mini-PCR size restrictions may result in further restrictions. In one embodiment, it is possible to start with a very large number of potential SNP targets (between approximately 500 and more than 1 million) and try to design primers to amplify each SNP. When primers can be designed, it is possible to try to identify pairs of primers likely to form spurious products by assessing the probability of spurious primer duplex formation among all possible primer pairs using published thermodynamic parameters for DNA duplex formation. Primer interactions can be classified by a score function related to the interaction and primers with the worst interaction scores are eliminated until the desired number of primers is reached. In cases where SNPs likely to be heterozygous are more useful, it is also possible to sort the list of assays and select the most heterozygous compatible assays. The experiments validated that primers with high interaction scores are more likely to form primer dimers. In high multiplexing, it is not possible to eliminate all spurious interactions, but it is essential to remove the primers or pairs of primers with the highest interaction scores in silico as they can dominate an entire reaction, greatly limiting the amplification of intended targets. This procedure was performed to create sets of multiplex initiators of up to 10,000 initiators. The improvement due to this procedure is substantial, allowing the amplification of more than 80%, more than 90%, more than 95%, more than 98%, and even more than 99% of the target products as determined by sequencing all products of PCR, compared to 10% from a reaction in which the worst primers were not removed. When combined with a partial semi-nested approach, as previously described, more than 90%, and even more than 95% of amplicons can map to the target sequences.
[000294] Note that there are other methods for determining which PCR probes are likely to form dimers. In one embodiment, analysis of a group of DNA that has been amplified using a non-optimized set of primers may be sufficient to determine problematic primers. For example, analysis can be done using sequencing, and the dimers that are present in greatest numbers are determined as those that are most likely to form dimers, and can be removed.
[000295] This method has a number of potential applications, for example, for SNP genotyping, determining the rate of heterozygosity, measuring the number of copies, and other targeted sequencing applications. In one embodiment, the initiator design method can be used in combination with the mini-PCR method described elsewhere in this document. In some embodiments, the initiator design method can be used as part of a massive multiplexed PCR method.
[000296] The use of markers in the primers can reduce the amplification and sequencing of the primer dimer products. Primer markers can be used to shorten the specific target sequence needed to below 20, below 15, below 12, and even below 10 base pairs. This can be unexpected with the standard primer design when the target sequence is fragmented within the primer binding site, or, the primer design can be designed. The advantages of this method include: it increases the number of assays that can be designed for a given maximum amplicon length, and shortens the "non-informative" sequencing of primer sequence. It can also be used in combination with internal marking (see elsewhere in this document).
[000297] In one embodiment, the relative amount of non-productive products in targeted multiplexed PCR amplification can be reduced by increasing the pairing temperature. In cases where libraries with the same marker as the specific target primers are being amplified, the pairing temperature can be increased compared to genomic DNA as the markers will contribute to primer binding. In some embodiments, considerably lower concentrations of initiators than previously reported are used together with the use of longer pairing times than reported in this document. Pairing times may be longer than 10 minutes, longer than 20 minutes, longer than 30 minutes, longer than 60 minutes, longer than 120 minutes, longer than 240 minutes, longer than 480 minutes , and even longer than 960 minutes. In one embodiment, longer pairing times are used than in previous reports, thus allowing lower concentrations of initiator. In some embodiments, primer concentrations are at most 50 nM, 20 nM, 10 nM, 5 nM, 1 nM, and less than 1 µm. This surprisingly results in robust performance for highly multiplexed reactions, for example, 1,000-plex reactions, 2,000-plex reactions, 5,000-plex reactions, 10,000-plex reactions, 20,000-plex reactions, 50,000-plex reactions, and even even 100,000-plex reactions. In one embodiment, the amplification uses one, two, three, four or five cycles performed with long pairing times, followed by PCR cycles, with more usual pairing times with marked primers.
[000298] To select the target locations, you can start with a group of candidate initiator pairs projects and create a thermodynamic model of potentially adverse interactions between the initiator pairs, and then use the model to eliminate the projects that are incompatible with other projects in the group. Variants of Targeted PCR - Nesting [000299] There are many workflows that are possible when conducting PCR; some typical workflows for the methods described here are described. The steps described here are not intended to exclude other possible steps, nor do they imply that any of the steps described here are required for the method to work properly. A large number of parameter variations or other modifications are known in the literature, and can be done without affecting the essence of the invention. A particular generalized workflow is given below followed by a number of possible variants. The variants typically refer to possible secondary PCR reactions, for example, different types of nesting that can be done (step 3). It is important to note that variants can be made at different times, or in different orders than explicitly described here. 1. The DNA in the sample can have ligation adapters, often called library markers or ligation adapter markers (LTs), attached, where the ligation adapters contain a universal initiation sequence, followed by universal amplification. In one embodiment, this can be done using a standard protocol designed to create sequencing libraries after fragmentation. In one embodiment, the DNA sample can have a blunt end, and then an A can be added at the 3 'end. An adapter Y with a secondary chain T can be added and connected. In some embodiments, other cohesive ends may be used, other than an A or T secondary chain. In some embodiments, other adapters may be added, for example, looping adapters. In some embodiments, adapters may have a marker designed for PCR amplification. 2. Specific target amplification (STA): Pre-amplification from hundreds to thousands to tens of thousands and even hundreds of thousands of targets can be multiplexed in one reaction. The STA is typically run from 10 to 30 cycles, from 5 to 40 cycles, from 2 to 50 cycles, and even from 1 to 100 cycles. Initiators may have a tail, for example, for a simpler workflow or to avoid sequencing a large proportion of dimers. Note that typically, the dimers of both primers carrying the same marker will not be amplified or sequenced effectively. In some modalities, between 1 and 10 cycles of PCR can be performed; in some modalities, between 10 and 20 cycles of PCR can be performed; in some modalities, between 20 and 30 PCR cycles can be performed; in some modalities, between 30 and 40 PCR cycles can be performed; in some modalities, more than 40 PCR cycles can be performed. The amplification can be a linear amplification. The number of PCR cycles can be optimized to result in an optimal reading depth profile (DOR). Different DOR profiles may be desirable for different purposes. In some modalities, a more regular distribution of readings among all tests is desirable; if the DOR is too small for some tests, the stochastic noise may be too high for the data to be very useful, while if the reading depth is too high, the marginal utility of each additional reading is relatively small.
[000300] Primer tails can enhance the detection of DNA fragment from universally labeled libraries. If the library marker and primer tails contain a homologous sequence, hybridization can be improved (for example, melting temperature (TM) is reduced) and primers can be extended if only part of the target primer sequence is in the fragment of DNA from the sample. In some embodiments, 13 or more specific target base pairs can be used. In some embodiments, 10 to 12 specific target base pairs can be used. In some embodiments, 8 to 9 specific target base pairs can be used. In some embodiments, 6 to 7 specific target base pairs can be used. In some embodiments, STA can be performed on pre-amplified DNA, for example, MDA, RCA, other integral genome amplifications, or adapter-mediated universal PCR. In some modalities, STA can be performed on samples that are enriched or depleted from certain sequences and populations, for example, by size selection, target capture, targeted degradation. 3. In some embodiments, it is possible to perform secondary multiplex PCRs or primer extension reactions to increase specificity and reduce undesirable products. For example, complete nesting, semi-nesting, hemi-nesting, and / or subdivision into parallel reactions of smaller test groups are all techniques that can be used to increase specificity. The experiments showed that dividing the sample into three 400-plex reactions resulted in product DNA with greater specificity than a 1,200-plex reaction with exactly the same primers. Similarly, experiments showed that dividing the sample into four 2,400-plex reactions resulted in product DNA with greater specificity than a 9,600-plex reaction with exactly the same primers. In one embodiment, it is possible to use specific target primers and specific marker primers of equal or opposite direction. 4. In some embodiments, it is possible to amplify a DNA sample (dilution, purified or otherwise) produced by a STA reaction using specific marker primers and "universal amplification", that is, to amplify many or all of the marked targets and pre-amplified. The initiators may contain additional functional strings, for example, bar codes, or a complete adapter string required for sequencing on a high throughput sequencing platform.
[000301] These methods can be used to analyze any DNA sample, and are especially useful when the DNA sample is particularly small, or when it is a DNA sample where the DNA originates from more than one individual, such as in case of maternal plasma. These methods can be used on DNA samples such as a single cell or small number of cells, genomic DNA, plasma DNA, amplified plasma libraries, amplified apoptotic supernatant libraries, or other mixed DNA samples. In one embodiment, these methods can be used in the case where cells of different genetic makeup may be present in a single individual, such as with cancer or transplants.
[000302] Protocol variants (variants and / or additions to the workflow above) [000303] Direct multiplexed mini-PCR: The specific target amplification (STA) of a plurality of target sequences with labeled primers is shown in Figure 1. 101 denotes double-stranded DNA with a polymorphic locus of interest in X. 102 denotes double-stranded DNA with added link adapters for universal amplification. 103 denotes single-stranded DNA that has been universally amplified with hybridized PCR primers. 104 denotes the final PCR product. In some modalities, STA can be made in more than 100, more than 200, more than 500, more than 1,000, more than 2,000, more than 5,000, more than 10,000, more than 20,000, more than 50,000, more than 100,000 or more 200,000 targets. In a subsequent reaction, specific marker primers amplify all target sequences and extend the markers to include all sequences necessary for sequencing, including sample indices. In one embodiment, the initiators may not be marked or only certain initiators may be marked. Sequencing adapters can be added by conventional adapter connection. In one embodiment, the initial initiators can load the markers.
[000304] In one embodiment, the primers are designed so that the amplified DNA length is unexpectedly short. The prior art demonstrates that those skilled in the art typically project 100+ bp amplicons. In one embodiment, amplicons can be designed to be less than 80 bp. In one embodiment, amplicons can be designed to be less than 70 bp. In one embodiment, amplicons can be designed to be less than 60 bp. In one embodiment, amplicons can be designed to be less than 50 bp. In one embodiment, the amplicons can be designed to be less than 45 bp. In one embodiment, amplicons can be designed to be less than 40 bp. In one embodiment, amplicons can be designed to be less than 35 bp. In one embodiment, the amplicons can be designed to be between 40 and 65 bp.
[000305] An experiment was carried out using this protocol using 1200-plex amplification. Both genomic DNA and pregnancy plasma were used; approximately 70% of sequence readings mapped to target sequences. Details are presented elsewhere in this document. Sequencing of 1042-plex without design and assay selection resulted in> 99% of the sequences being products of primer dimers.
[000306] Sequential PCR: After STA1, multiple aliquots of the product can be amplified in parallel with groups of reduced complexity with the same primers. The first amplification can provide enough material to divide. This method is especially good for small samples, for example, those that are approximately 6-100 pg, approximately 100 pg to 1 ng, approximately 1 ng and 10 ng, or approximately 10 ng to 100 ng. The protocol was performed with 1200-plex in three of 400-plex. The mapping of sequencing readings increased from approximately 60 to 70% on 1200-plex alone to more than 95%. [000307] Semi-Nested Mini-PCR: (see Figure 2) After STA, a second STA is performed comprising a multiplex set of internal nested direct primers (103 B, 105 b) and one (or some) specific marker reverse primers (103 A ). 101 denotes double-stranded DNA with a polymorphic locus of interest in X. 102 denotes double-stranded DNA with added link adapters for universal amplification. 103 denotes single-stranded DNA that has been universally amplified with forward B primer and hybridized reverse A primer. 104 denotes the PCR product from 103. 105 denotes the product of 104 with the n-primed primer b hybridized, and the reverse marker A is already part of the PCR molecule that occurred between 103 and 104. 106 denotes the final PCR product. With this workflow, more than 95% of sequences generally map to the intended targets. The nested primer can overlap with the external direct primer sequence, but introduces additional bases at the 3 'end. In some embodiments, it is possible to use between one and 20 extra 3 'bases. Experiments have shown that the use of 9 or more extra 3 'bases in a 1200-plex design works well.
[000308] Mini-PCR entirely nested: (see Figure 3) After step 1 of STA, it is possible to perform a second multiplex PCR (or parallel mp PCRs of reduced complexity), with two nested primers carrying markers (A, a, B , B). 101 denotes double-stranded DNA with a polymorphic locus of interest in X. 102 denotes double-stranded DNA with added link adapters for universal amplification. 103 denotes single-stranded DNA that has been universally amplified with forward B primer and hybridized reverse A primer. 104 denotes the PCR product of 103. 105 denotes the product of 104 with the nested forward primer b and the reverse primer nested to hybridized. 106 denotes the final PCR product. In some embodiments, it is possible to use two complete sets of initiators. Experiments using a completely nested mini-PCR protocol were used to perform 146-plex amplification on single cells and three cells without the 102 step of attaching universal binding adapters and amplifying.
[000309] Hemianline mini-PCR: (see Figure 4) It is possible to use the target DNA and adapters at the ends of the fragment. STA is performed comprising a multiplex set of forward primers (B) and one (or some) specific marker reverse primer (A). A second STA can be performed using a direct specific universal marker primer and specific reverse target primer. 101 denotes double-stranded DNA with a polymorphic locus of interest in X. 102 denotes double-stranded DNA with added link adapters for universal amplification. 103 denotes single-stranded DNA that has been universally amplified with a hybridized reverse A primer. 104 denotes the PCR product of 103 that was amplified using the reverse primer A and the LT link adapter marker primer. 105 denotes the product of 104 with the hybridized direct primer B. 106 denotes the final PCR product. In this workflow, target specific forward and reverse primers are used in separate reactions, thereby reducing the complexity of the reaction and preventing the formation of forward and reverse primer dimers. Note that, in this example, primers A and B can be considered as first primers, and primers 'a' and 'b' can be considered as internal primers. This method is a big improvement over direct PCR, since it is as good as direct PCR, but avoids primer dimers. After the first stage of the hemianigned protocol, ~ 99% of non-target DNA is normally seen, however, after the second stage, there is usually a great improvement.
[000310] Mini-PCR triple hemianched: (see Figure 5) It is possible to use target DNA and adapter at the ends of the fragment. STA is performed comprising a multiplex set of forward primers (B) and one (or a few) specific marker reverse primer (A) and (a). A second STA can be performed using a specific marker direct primer and target specific reverse primer. 101 denotes double-stranded DNA with a polymorphic locus of interest in X. 102 denotes double-stranded DNA with added link adapters for universal amplification. 103 denotes single-stranded DNA that has been universally amplified with a hybridized reverse A primer. 104 denotes the PCR product of 103 that was amplified using the reverse primer A and the LT link adapter marker primer. 105 denotes the product of 104 with the hybridized direct primer B. 106 denotes the PCR product of 105 which was amplified using reverse primer A and direct primer B. 107 denotes the product of 106 with the hybridized 'a' reverse primer. 108 denotes the final PCR product. Note that, in this example, primers 'a' and B can be considered as internal primers, and A can be considered as a first primer. Optionally, both A and B can be considered as first initiators, and 'a' can be considered as an internal initiator. The designation of forward and reverse initiators can be changed. In this workflow, target specific reverse and forward primers are used in separate reactions, thus reducing the reaction complexity and preventing the formation of forward and reverse primer dimers. This method is a big improvement in direct PCR, since it is as good as direct PCR, but it avoids primer dimers. After the first stage of the hemianigned protocol, ~ 99% of non-target DNA is usually seen, however, after the second stage, there is usually a great improvement. [000311] Unilateral nested mini-PCR: (see Figure 6) It is possible to use the target DNA, which has an adapter for the ends of the fragment. STA can also be performed with a multiplex set of nested forward primers and using the link adapter marker as the reverse primer. A second STA can then be performed using a set of nested forward primers and a universal reverse primer. 101 denotes double-stranded DNA with a polymorphic locus of interest in X. 102 denotes double-stranded DNA with added link adapters for universal amplification. 103 denotes single-stranded DNA that has been universally amplified with a hybridized direct A primer. 104 denotes the PCR product of 103 that was amplified using a forward primer A and in reverse linker marker LT primer. 105 denotes the product of 104 with direct primer nested as hybridized. 106 denotes the final PCR product. This method can detect target sequences shorter than standard PCR using primers overlaid on the first and second STAs. The method is typically performed on a DNA sample that has already passed step 1 above STA - attaching universal markers and amplification; the two nested primers are on one side only, the other side uses the library marker. The method was performed in libraries of apoptotic supernatants and pregnancy plasma. With this workflow, approximately 60% of the sequences mapped to the intended targets. Note that the readings that contained the reverse adapter sequence were not mapped, so this number is expected to be higher if the readings that contain the reverse adapter sequence are mapped.
[000312] Unilateral mini-PCR: It is possible to use target DNA that has an adapter at the fragment ends (see Figure 7). STA can be performed with a multiplex set of forward primers and one (or a few) specific marker reverse primer. 101 denotes double-stranded DNA with a polymorphic locus of interest in X. 102 denotes double-stranded DNA with added link adapters for universal amplification. 103 denotes single stranded DNA with hybridized direct A primer. 104 indicates the PCR product of 103 that has been amplified using forward primer A and LT linker marker reverse primer, and which is the final PCR product. This method can detect shorter target sequences than standard PCR. However, it can be relatively non-specific, as only a specific target primer is used. This protocol is effectively half of the unilateral nested mini-PCR.
[000313] Reverse semi-nested mini-PCR: It is possible to use target DNA that has an adapter at the ends of the fragment (see Figure. 8). STA can be performed with a multiplex set of forward primers and one (or a few) specific marker reverse primer. 101 denotes double-stranded DNA with a polymorphic locus of interest in X. 102 denotes double-stranded DNA with added link adapters for universal amplification. 103 denotes single stranded DNA with the hybridized reverse B primer. 104 denotes the PCR product of 103 that was amplified using the reverse primer B and the direct linker marker primer LT. 105 denotes the PCR product 104 with the hybridized forward primer A, and the internal reverse primer 'b'. 106 denotes the PCR product that was amplified from 105 using forward primer A and reverse primer 'b', which is the final PCR product. This method can detect shorter target sequences than standard PCR.
[000314] There may also be more variants that are simply iterations or combinations of the above methods, such as double-nested PCR, where three sets of primers are used. Another variant is a mini-PCR nested on one side and a half, where STA can also be performed with a multiplex set of nested forward primers and one (or some) specific marker reverse primer. [000315] Note that in all these variants, the identity of the forward and reverse initiators can be exchanged. Note that in some modalities, the nested variant can be executed equally well without the preparation of the initial library, which includes attaching the adapter markers, and a universal amplification step. It is noted that in some modalities, additional PCR steps can be included, with additional forward and / or reverse primers and amplification step; these additional steps can be particularly useful, if it is desirable to further increase the percentage of DNA molecules that correspond to the target loci. Nesting workflows [000316] There are many ways to perform amplification, with different degrees of nesting, and with different degrees of multiplexing. In Figure 9, a flow chart is given with some of the possible workflows. Note that the use of 10,000-plex PCR is intended to be an example only; these flowcharts would work equally well for other degrees of multiplexing.
Cycle connection adapters [000317] When adding universal marked adapters, for example, for the purpose of making a library for sequencing, there are several ways to connect adapters. One way is the blunt end of the sample's DNA, run tail A, and attach with adapters that have a secondary T chain. There are several ways to attach adapters. There are also several adapters that can be connected. For example, a Y adapter can be used where the adapter consists of two strands of DNA where one strand has a double strand region, and a region specified by a forward primer region, and where the other strand is specified by a strand region double which is complementary to the double strand region on the first strand, and a region with a reverse primer. The double-stranded region, when paired, may contain a secondary T-strand for the purpose of binding double-stranded DNA with an secondary A-strand.
[000318] In one embodiment, the adapter can be a DNA loop where the end regions are complementary, and where the loop region contains a marked forward primer (LFT) region, a reverse primer marked region (LRT), and a cleavage site between the two (see Figure 10). 101 refers to blunt-stranded double-stranded DNA. 102 refers to tail A target DNA. 103 refers to the T 'T secondary strand loop binding adapter and the' Z 'cleavage site. 104 refers to target DNA with loop-attached adapters attached. 105 refers to the target DNA with attached ligation adapters cleaved at the cleavage site. LFT refers to the direct link adapter marker, and the LRT refers to the reverse link adapter marker. The complementary region can end in a secondary T chain, or another characteristic can be used to bind to the target DNA. The cleavage site can be a series of uracils for UNG cleavage, or a sequence that can be recognized and cleaved by a restriction enzyme or other cleavage method or just basic amplification. These adapters can be used for any library preparation, for example, for sequencing. These adapters can be used in combination with any of the other methods described here, for example, mini-PCR amplification methods.
Internally labeled primers [000319] When using sequencing to determine the allele present in a given polymorphic locus, sequence reading typically begins upstream of the primer binding site (a), and then to the polymorphic site (X). The markers are typically configured as shown in Figure 11, left. 101 refers to the single stranded target DNA with polymorphic locus of interest 'X', and the primer 'a' with the attached marker 'b'. In order to avoid non-specific hybridization, the primer binding site (target DNA region complementary to 'a') is typically 18 to 30 bp in length. The 'b' sequence marker is typically approximately 20 bp; in theory, it can be any length greater than approximately 15 bp, although many people use the primer strings that are sold by the sequencing platform company. The distance 'd' between 'a' and 'X' can be at least 2 bp in order to avoid allelic bias. When performing multiplex PCR amplification using the methods described here or other methods, when the primer design is necessary to avoid excessive primer-primer interaction, the allowed distance window 'd' between 'a' and 'X' can vary very little: from 2 bp to 10 bp, from 2 bp to 20 bp, from 2 bp to 30 bp, or even from 2 bp to more than 30 bp. So, when using the primer configuration shown in Figure 11, left, sequence readings need to be a minimum of 40 bp to obtain readings long enough to measure the polymorphic locus, and depending on the lengths of 'a' and 'd ', the sequence readings may need to be up to 60 or 75 bp. Generally, the longer the sequence readings, the higher the cost and the longer the sequencing time, a given number of readings, then, minimizing the reading length needed to save time and money. In addition, since, on average, bases read earlier in the reading are read more precisely than those readings read later in the reading, decreasing the required sequence reading length can also increase the accuracy of the measurements of the polymorphic region.
[000320] In one embodiment, called internally marked initiators, the initiator binding site (a) is divided into a plurality of segments (a ', a' ', a' '', ...), and the marker of sequence (b) is in a DNA segment that is in the middle of two of the primer binding sites, as shown in Figure 11, 103. This configuration allows the sequencer to take shorter sequence readings. In one embodiment, the '+ a' should be at least approximately 18 bp, and can be at least 30, 40, 50, 60, 80, 100 or more than 100 bp. In one mode, the '' should be at least approximately 6 bp, and in one mode it is between approximately 8 and 16 bp. All other factors being equal, using the internally marked initiators, you can cut the length of the required sequence readings by at least 6 bp, to a maximum of 8 bp, 10 bp, 12 bp, 15 bp, and up to 20 or 30 bp. This can result in significant expense, time and accuracy. An example of internally labeled primers is given in Figure 12. Primers with binding adapter binding region [000321] A problem with fragmented DNA is that because it is short in length, the chance that a polymorphism is close to the end of a strand DNA is taller than for a long strand (for example, 101, Figure 10). Since PCR capture of a polymorphism requires a primer binding site of adequate length on both sides of the polymorphism, a significant number of DNA strands with the targeted polymorphism will be lost due to insufficient overlap between the primer and the binding site. target. In one embodiment, target DNA 101 may have attachment adapters attached 102, and target primer 103 may have a region (cr) that is complementary to the attachment adapter marker (lt) attached upstream of the projected attachment region (a ) (see Figure 13); thus, in cases where the binding region (101 region that is complementary to aa) is shorter than 18 bp typically required for hybridization, the region (cr) on the primer that is complementary to the library marker is able to increase energy link to a point where PCR can be continued. It is noted that any specificity that is lost due to a shorter binding region may consist of other PCR primers with suitably long target binding regions. It is noted that this modality can be used in combination with direct PCR, or any of the other methods described here, such as nested PCR, semi-nested PCR, hemi-nested PCR, nested or semi- or semi-nested hemispheric PCR, or other PCR protocols.
[000322] When using sequencing data to determine ploidy in combination with an analytical method that involves comparing observed allelic data with expected allelic distributions for various hypotheses, each additional reading from alleles with a low reading depth will result in more information than a reading from an allele with a high reading depth. Then, ideally, one might want to see uniform reading depth (DOR) where each locus will have a similar number of representative sequence readings. So, it is desirable to minimize the variance of PAIN. In one modality, it is possible to decrease the coefficient of variance of DOR (this can be defined as the standard deviation of DOR / the average DOR) by increasing the pairing times. In some embodiments, pairing times can be longer than 2 minutes, longer than 4 minutes, longer than ten minutes, longer than 30 minutes, and longer than an hour, or even longer . Since pairing is a balancing process, there is no limit to improving the PAIN variance with increasing pairing times. In one embodiment, increasing the concentration of initiator can decrease the variance in PAIN.
Diagnostic Box [000323] In one embodiment, the present description comprises a diagnostic box that is capable of partially or completely executing any of the methods described in this description. In one embodiment, the diagnostic box can be located in a doctor's office, in a hospital laboratory, or any suitable location reasonably close to the patient's point of care. The cashier may be able to execute the entire method in a completely automated way, or the cashier may require that one or more steps be completed manually by a technician. In one embodiment, the box may be able to analyze at least the genotypic data measured in maternal plasma. In one embodiment, the box can be connected by devices to transmit the genotypic data measured in the diagnostic box to an external computing facility that can then analyze the genotypic data, and possibly also generate a report. The diagnostic box may include a robotics unit that is capable of transferring aqueous or liquid samples from one container to another. It can also comprise various reagents, both solid and liquid. It can comprise a high performance sequencer and can comprise a computer.
Starter Kit [000324] In some embodiments, a kit that can be formulated comprises several starters designed to achieve the methods described in this description. The primers can be external forward and reverse primers, internal forward and reverse primers as described here, they can be primers that were designed to have low binding affinity with other primers in the kit as described in the primer design section, they could be probes hybrid capture or pre-circularized probes as described in the relevant sections, or some combination thereof. In one embodiment, a kit can be formulated to determine a ploidy state of a target chromosome in a gestating fetus designed for use with the methods described here, the kit comprises a plurality of direct internal primers and optionally a plurality of reverse primers internal, and optionally external direct and external reverse primers, where each primer is designed to hybridize to the DNA region immediately upstream and / or downstream of one of the polymorphic sites on the target chromosome, and optionally additional chromosomes. In one embodiment, the starter kit can be used in combination with the diagnostic box described in this document.
DNA compositions
[000325] When performing computer analysis on sequencing data measured in a mixture of fetal and maternal blood to determine the genomic information pertaining to the fetus, for example, the ploidy status of the fetus, it may be advantageous to measure allelic distributions in a set of alleles. Unfortunately, in many cases, such as when trying to determine a fetus' ploidy status from the DNA mixture found in the plasma of a maternal blood sample, the amount of available DNA is not sufficient to directly measure allelic distributions with good fidelity in the mix. In such cases, amplification of the DNA mixture will provide sufficient numbers of DNA molecules that the desired allelic distributions can be measured with good fidelity. However, current amplification methods typically used in amplifying DNA for sequencing are often very skewed, meaning that they do not amplify both alleles at a polymorphic locus by the same amount. A skewed amplification can result in allelic distributions that are very different from the allelic distributions in the original mixture. For most purposes, highly accurate measurements of the relative amounts of alleles present in polymorphic loci are not necessary. In contrast, in one embodiment of the present description, amplification or enrichment methods that specifically enrich polymorphic alleles and preserve allelic relationships are advantageous.
[000326] Various methods that are described here can be used to preferentially enrich a DNA sample in a plurality of loci in a way that minimizes allelic bias. Some examples are using circularization probes to target a plurality of loci where the 3 'and 5' ends of the pre-circularized probe are designed to hybridize to bases that are at one or a few positions from the polymorphic sites of the target allele. Another is to use PCR probes where the 3 'end of the PCR probe is designed to hybridize to bases that are at one or a few positions from the polymorphic sites of the target allele. Another is to use a division and grouping approach to create DNA mixtures where preferentially enriched loci are enriched with low allelic bias without the disadvantages of direct multiplexing. Another is to use a hybrid capture approach where capture probes are designed so that the capture probe region that is designed to hybridize with DNA flanking the target's polymorphic site is separated from the polymorphic site by one or a small number of bases.
[000327] In the case where the allelic distributions measured in a set of polymorphic loci are used to determine an individual's ploidy state, it is desirable to preserve the relative amounts of alleles in a DNA sample as it is prepared for genetic measurements. This preparation may involve WGA amplification, targeted amplification, selective enrichment techniques, hybrid capture techniques, circularization probes or other methods aimed at amplifying the amount of DNA and / or selectively intensifying the presence of DNA molecules that correspond to certain alleles. [000328] In some embodiments of this description, there is a set of DNA probes designed to target loci where the loci have maximum secondary allelic frequencies. In some embodiments of the present description, there is a set of probes that are designed to target where the loci have the maximum likelihood of the fetus having a highly informative SNP in those loci. In some embodiments of the present description, there is a set of probes that are designed to target loci where the probes are optimized for a given population subgroup. In some embodiments of the present description, there is a set of probes that are designed to target loci where the probes are optimized for a given mix of population subgroups. In some embodiments of the present description, there is a set of probes that are designed to target loci where the probes are optimized for a given pair of parents who are from different population subgroups that have different secondary allele frequency profiles. In some embodiments of the present description, there is a circularized strand of DNA that comprises at least one base pair that has been paired in a piece of DNA that is of fetal origin. In some embodiments of the present description, there is a circularized strand of DNA that comprises at least a pair of bases that have paired on a piece of DNA that is of placental origin. In some embodiments of the present description, there is a circularized strand of DNA that has circularized while at least some of the nucleotides have been paired for DNA that was of fetal origin. In some embodiments of the present description, there is a circularized strand of DNA that has circularized while at least some of the nucleotides have been paired for DNA that was of placental origin. In some embodiments of the present description, there is a set of probes where some of the probes target short tandem repeated sequences, and some of the probes target single nucleotide polymorphisms. In some modalities, the loci are selected for the purpose of non-invasive prenatal diagnosis. In some modalities, probes are used for the purpose of non-invasive prenatal diagnosis. In some embodiments, loci are targeted using a method that could include circularization probes, MIPs, hybridization capture probes, probes in an SNP matrix, or combinations thereof. In some embodiments, the probes are used as circularization probes, MIPs, hybridization capture probes, probes in an SNP matrix, or combinations thereof. In some modalities, the loci are sequenced for the purpose of non-invasive prenatal diagnosis.
[000329] In the case where the relative informativeness of a sequence is greater when combined with relevant parental contexts, it follows that maximizing the number of sequence readings that contain an SNP for which the parental context is known can maximize the informativeness of the reading set sequencing in the mixed sample. In one embodiment, the number of sequence readings that contain an SNP for which parental contexts are known can be enhanced using qPCR to preferentially amplify specific sequences. In one embodiment, the number of sequence readings that contain an SNP for which parental contexts are known can be enhanced by using circularization probes (for example, MIPs) to preferentially amplify specific sequences. In one embodiment, the number of sequence readings that contain an SNP for which parental contexts are known can be enhanced using a hybridization capture method (for example, SURE SELECT) to preferentially amplify specific sequences. Different methods can be used to increase the number of sequence readings that contain an SNP for which parental contexts are known. In one embodiment, targeting can be performed by link with extension, link without extension, capture by hybridization, or PCR.
[000330] In a fragmented genomic DNA sample, a fraction of the DNA sequences maps exclusively to individual chromosomes; other DNA sequences can be found on different chromosomes. It is noted that the DNA found in plasma, whether of maternal or fetal origin, is typically fragmented, often in lengths less than 500 bp. In a typical genomic sample, approximately 3.3% of mappable sequences will map to chromosome 13; 2.2% of mappable sequences will map to chromosome 18; 1.35% of mappable sequences will map to chromosome 21; 4.5% of mappable sequences will map to the X chromosome in a female; 2.25% of mappable sequences will map to the X chromosome (in a male); and 0.73% of mappable sequences will map to the Y chromosome (in a male). These are the chromosomes that are most likely to be aneuploid in a fetus. Also, among the short strings, approximately 1 in 20 strings will contain an SNP, using the SNPs contained in dbSNP. The proportion may be higher as there may be many SNPs that have not been discovered.
[000331] In one embodiment of the present description, targeting methods can be used to enhance the fraction of DNA in a DNA sample that maps to a given chromosome such that the fraction significantly exceeds the percentages quoted above that are typical for genomic samples . In one embodiment of the present description, targeting methods can be used to enhance the fraction of DNA in a DNA sample such that the percentage of sequences containing an SNP is significantly greater than can be found in typical genomic samples. In one embodiment of the present description, targeting methods can be used to target DNA from a chromosome or a set of SNPs in a mixture of maternal and fetal DNA for the purpose of prenatal diagnosis.
[000332] Note that a method has been reported (US Patent 7,888,017) to determine fetal aneuploidy by counting the number of readings that map to a suspect chromosome and comparing it to the number of readings that map to a chromosome reference, and to hypothesize that an overabundance of readings on the suspect chromosome corresponds to a triploidy in the fetus on that chromosome. These methods of prenatal diagnosis would not make use of guidance of any kind, nor do they describe the use of guidance for prenatal diagnosis.
[000333] When making use of targeting approaches in mixed sample sequencing, it may be possible to achieve a certain level of accuracy with less sequence readings. Precision can refer to sensitivity, it can refer to specificity, or it can refer to some combination of them. The desired level of accuracy can be between 90% and 95%, it can be between 95% and 98%; it can be between 98% and 99%, it can be between 99% and 99.5%, it can be between 99.5% and 99.9%, it can be between 99.9% and 99.99%; it can be between 99.99% and 99.999%, it can be between 99.999% and 100%. Accuracy levels above 95% can be called high accuracy.
[000334] There are several methods published in the prior art that demonstrate how to determine the fetal ploidy status from a mixed sample of maternal and fetal DNA, for example: G.J. W. Liao et al., Clinical Chemistry 2011; 57 (1) pp. 92 to 101. These methods focus on thousands of locations along each chromosome. The number of locations along a chromosome that can be targeted while still resulting in a high precision ploidy determination in a fetus, for a given number of sequence readings, from a mixed DNA sample, is unexpectedly low. In an embodiment of the present description, an accurate ploidy determination can be made using targeted sequencing, using any targeting method, for example, qPCR, ligand-mediated PCR, other PCR methods, hybridization capture, or circularization probes, where the number of loci along a chromosome that needs to be targeted between 5,000 and 2,000 loci; it can be between 2,000 and 1,000 loci, it can be between 1,000 and 500 loci, it can be between 500 and 300 loci, it can be between 300 and 200 loci, it can be between 200 and 150 loci, it can be between 150 and 100 loci, it can be between 100 and 50 loci, it can be between 50 and 20 loci, it can be between 20 and 10 loci. Optimally, it can be between 100 and 150 loci. The high level of accuracy can be achieved by targeting a small number of loci and performing an unexpectedly small number of sequence readings. The number of readings can be between 100 million and 50 million readings; number of readings can be between 50 million and 20 million readings; the number of readings can be between 20 million and 10 million readings; the number of readings can be between 10 million and 5 million readings; the number of readings can be between 5 million and 2 million readings; the number of readings can be between 2 million and 1 million readings; the number of readings can be between 1 million and 500,000 readings; the number of readings can be between 500,000 and 200,000 readings; the number of readings can be between 200,000 and 100,000 readings; the number of readings can be between 100,000 and 50,000 readings; the number of readings can be between 50,000 and 20,000 readings; the number of readings can be between 20,000 and 10,000 readings; the number of readings can be below 10,000 readings. Fewer numbers of readings are required for larger amounts of incoming DNA.
[000335] In some modalities, there is a composition comprising a mixture of DNA of fetal origin, and DNA of maternal origin, where the percentage of sequences that map exclusively to chromosome 13 is greater than 4%, greater than 5%, greater than 6%, greater than 7%, greater than 8%, greater than 9%, greater than 10%, greater than 12%, greater than 15%, greater than 20%, greater than than 25%, greater than 30%. In some embodiments of the present description, there is a composition comprising a mixture of DNA of fetal origin, and DNA of maternal origin, where the percentage of sequences that map exclusively to chromosome 21 is greater than 2%, greater than 3%, greater than 4%, greater than 5%, greater than 6%, greater than 7%, greater than 8%, greater than 9%, greater than 10%, greater than 12%, greater than than 15%, greater than 20%, greater than 25%, or greater than 30%. In some embodiments of the present description, there is a composition comprising a mixture of DNA of fetal origin, and DNA of maternal origin, where the percentage of sequences that map exclusively to the X chromosome is greater than 6%, greater than 7%, greater than 8%, greater than 9%, greater than 10%, greater than 12%, greater than 15%, greater than 20%, greater than 25%, or greater than 30%. In some embodiments of the present description, there is a composition comprising a mixture of DNA of fetal origin, and DNA of maternal origin, where the percentage of sequences that map exclusively to the Y chromosome is greater than 1%, greater than 2%, greater than 3%, greater than 4%, greater than 5%, greater than 6%, greater than 7%, greater than 8%, greater than 9%, greater than 10%, greater than than 12%, greater than 15%, greater than 20%, greater than 25%, or greater than 30%.
[000336] In some embodiments, a composition is described comprising a mixture of DNA of fetal origin, and DNA of maternal origin, where the percentage of sequences that map exclusively to a chromosome, and that contains at least one single nucleotide polymorphism, is greater than 0.2%, greater than 0.3%, greater than 0.4%, greater than 0.5%, greater than 0.6%, greater than 0.7%, greater than than 0.8%, greater than 0.9%, greater than 1%, greater than 1.2%, greater than 1.4%, greater than 1.6%, greater than 1.8 %, greater than 2%, greater than 2.2%, greater than 2.5%, greater than 3%, greater than 4%, greater than 5%, greater than 6%, greater than than 7%, greater than 8%, greater than 9%, greater than 10%, greater than 12%, greater than 15%, greater than 20% and where the chromosome is obtained from the group of chromosomes 13, 18, 21, X or Y. In some embodiments of the present description, there is a composition comprising a mixture of DNA of fetal origin, and DNA of maternal origin , where the percentage of sequences that map exclusively to a chromosome and that contain at least one single nucleotide polymorphism from a set of single nucleotide polymorphisms is greater than 0.15%, greater than 0.2%, greater than 0.3%, greater than 0.4%, greater than 0.5%, greater than 0.6%, greater than 0.7%, greater than 0.8%, greater than 0.9%, greater than 1%, greater than 1.2%, greater than 1.4%, greater than 1.6%, greater than 1.8%, greater than 2%, greater greater than 2.2%, greater than 2.5%, greater than 3%, greater than 4%, greater than 5%, greater than 6%, greater than 7%, greater than 8% , greater than 9%, greater than 10%, greater than 12%, greater than 15%, greater than 20% and where the chromosome is obtained from the group of chromosomes 13, 18, 21, X and Y, and where the number of single nucleotide polymorphisms in the set of single nucleotide polymorphisms is between 1 and 10, between 10 and 20, between 20 and 50, between 50 and 100, between 100 and 200, between 200 and 500, between 500 and 1,000, between 1,000 and 2,000, between 2,000 and 5,000, between 5,000 and 10,000, between 10,000 and 20,000, between 20,000 and 50,000, and between 50,000 and 100,000.
[000337] In theory, each cycle in amplification doubles the amount of DNA present; however, in reality, the degree of amplification is slightly less than two. In theory, amplification, including targeted amplification, will result in bias-free amplification of a mixture of DNA; in reality, however, different alleles tend to be amplified to a different extent from other alleles. When DNA is amplified, the degree of allelic bias typically increases with the number of amplification steps. In some embodiments, the methods described here involve amplifying DNA with a low level of allelic bias. Like allelic bias compounds with each additional cycle, one can determine the allelic bias per cycle by calculating the root of the general bias where n is the base 2 logarithm of degree of enrichment. In some embodiments, there is a composition comprising a second mixture of DNA, where the second mixture of DNA has preferably been enriched in a plurality of polymorphic loci from a first mixture of DNA where the degree of enrichment is at least 10, at least 100 , at least 1,000, at least 10,000, at least 100,000, or at least 1,000,000, and the ratio of alleles in the second DNA mix at each locus differs from the ratio of alleles at that locus in the first DNA mix by a factor that is , on average, less than 1,000%, 500%, 100%, 50%, 20%, 10%, 5%, 2%, 1%, 0.5%, 0.2%, 0.1%, 0 , 05%, 0.02%, or 0.01%. In some embodiments, there is a composition comprising a second mixture of DNA, where the second mixture of DNA has preferably been enriched in a plurality of polymorphic loci from a first mixture of DNA where the allelic bias per cycle for the plurality of polymorphic loci is , on average, less than 10%, 5%, 2%, 1%, 0.5%, 0.2%, 0.1%, 0.05%, or 0.02%. In some embodiments, the plurality of polymorphic loci comprises at least 10 loci, at least 20 loci, at least 50 loci, at least 100 loci, at least 200 loci, at least 500 loci, at least 1,000 loci, at least 2,000 loci, at least 5,000 loci, at least 10,000 loci, at least 20,000 loci, or at least 50,000 loci.
Maximum likelihood estimates [000338] Most methods known in the art to detect the presence or absence of biological phenomenon or medical condition involve the use of a single hypothesis rejection test, where a metric that is correlated with the condition is measured, and if the metric is on one side of a given limit, the condition is present, while if the metric falls on the other side of the limit, the condition is absent. A single hypothesis rejection test only considers the null distribution when deciding between the null and alternative hypotheses. Without taking into account the alternative distribution, the probability of each hypothesis cannot be estimated given the observed data and then a confidence in the determination cannot be calculated. Therefore, with a single hypothesis rejection test, a yes or no answer can be obtained without a sense of confidence associated with the specific case. [000339] In some modalities, the method described here is able to detect the presence or absence of a biological phenomenon or medical condition using a method of maximum likelihood. This is a substantial improvement over a method using a single hypothesis rejection technique as the threshold for the absence or presence of condition determination can be adjusted as appropriate for each case. This is particularly relevant for diagnostic techniques that aim to determine the presence or absence of aneuploidy in a gestating fetus from genetic data available from the mixture of fetal and maternal DNA present in the free DNA found in maternal plasma. This is because the fraction of fetal DNA in the plasma-derived fraction changes, the optimal limit for determining changes from aneuploidy versus euploidy. As the fetal fraction falls, the data distribution that is associated with an aneuploidy becomes increasingly similar to the data distribution that is associated with an euploidy.
[000340] The maximum likelihood estimation method uses the distributions associated with each hypothesis to estimate the probability of the conditioned data in each hypothesis. These conditional probabilities can then be converted into a hypothesis and confidence determination. Similarly, the maximum a posteriori estimation method uses the same conditional probabilities as the maximum likelihood estimate, but it also incorporates the population before when choosing the best hypothesis and determining confidence.
[000341] So, the use of a maximum likelihood estimation technique (MLE), or the closely related maximum posterior posterior technique (MAP) provides two advantages, first it increases the chance of correcting the determination, and it also allows a confidence calculated for each determination. In one modality, selecting the ploidy state corresponding to the hypothesis with the highest probability is performed using maximum likelihood estimates or maximum a posteriori estimates. In one embodiment, a method is described for determining the ploidy status of a fetus in gestation that involves using any method known in the art that uses a single hypothesis rejection technique and reformulating it so that it uses an MLE technique or MAP. Some examples of methods that can be significantly improved by applying these techniques can be found in the US Patent. No. 8,008,018, US Patent. No. 7,888,017 or US Patent.
No. 7,332,277.
[000342] In one embodiment, a method is described for determining the presence or absence of fetal aneuploidy in a maternal plasma sample comprising fetal and maternal genomic DNA, the method comprising: obtaining a maternal plasma sample; measure the DNA fragments found in the plasma sample with a high-throughput sequencer; map the sequences to the chromosome and determine the number of sequence readings that map to each chromosome; calculate the fraction of fetal DNA in the plasma sample; calculate an expected distribution of the amount of a target chromosome that would be expected to be present if the second target chromosome were euploid and one or more expected distributions that would be expected if that chromosome were aneuploid, using the fetal fraction and the number of sequence readings that map for one or more reference chromosomes that are expected to be euploid; and using an MLE or MAP to determine which distribution is most likely to be correct, thus indicating the presence or absence of fetal aneuploidy. In one embodiment, measuring DNA from plasma may involve conducting massively parallel "shotgun" sequencing. In one embodiment, measuring DNA from the plasma sample may involve sequencing DNA that has been preferably enriched, for example, through targeted amplification, in a plurality of polymorphic or non-polymorphic loci. The plurality of loci can be designed to target one or a small number of suspect aneuploid chromosomes and one or a small number of reference chromosomes. The purpose of preferential enrichment is to increase the number of sequence readings that are informative for ploidy determination. Computer-based methods of ploidy determination [000343] Here is described a method for determining the fetal ploidy status given sequence data. In some embodiments, this sequence data can be measured on a high-throughput sequencer. In some embodiments, sequence data can be measured in DNA that originated from free DNA isolated from maternal blood, where the free DNA comprises some DNA of maternal origin, and some DNA of fetal / placental origin. This section will describe a modality of the present description in which the ploidy state of the fetus is determined assuming that the fraction of fetal DNA in the mixture that was analyzed is not known and will be estimated from the data. Also described is a modality in which the fraction of fetal DNA ("fetal fraction") or the percentage of fetal DNA in the mixture can be measured by another method, and is assumed to be known in determining the fetal ploidy state. In some modalities, the fetal fraction can be calculated using only the genotyping measurements made on the maternal blood sample itself, which is a mixture of fetal and maternal DNA. In some embodiments, the fraction can also be calculated using the measured, or otherwise known, genotype of the mother and / or the measured, or otherwise known, genotype of the father. In another embodiment, the fetal ploidy state can be determined solely on the basis of the calculated fraction of fetal DNA for the chromosome in question compared with the calculated fraction of fetal DNA for the assumed chromosome reference.
[000344] In the preferred mode, it is assumed that, for a particular chromosome, N SNPs are observed and analyzed, for which one has: - Set of NR measurements of free DNA sequence S = (s1, .. ., sNR). Since this method uses SNP measurements, all sequence data that corresponds to non-polymorphic loci can be neglected. In a simplified version, when there are counts (A, B) in each SNP, where A and B correspond to the two alleles present in a given locus, S can be written as S = ((ai, bi), ..., (aN, ón)), where ai is count A in SNP i, bi is count B in SNP i, and Σι = ι · .Ν (αί + fy) = NR - Parental data consisting of: the genotypes of a SNPs micromatrix or other intensity-based genotyping platform: mother M = (mi, ..., mN), father F = (fi, ..., fN), where mi, fi and (AA, AB, BB). o AND / OR sequence data measurements: NRM measurements from the mother SM = (smi, ..., smnrm), NRM measurements from the father SF = (sfi,., sfnrf). Similar to the above amplification, there are counts (A, B) in each SNP SM = ((am1, bm1),., (AmN, bmN)), SF = ((af1, bf1),., (AfN, bfN )).
[000345] Collectively, the data of the mother, father and child are denoted as D = (M, F, SM, SF, S). Note that parental data is desired and increases the accuracy of the algorithm, but is not necessary, especially for the father's data. This means that even in the absence of data from the mother and / or father, it is possible to obtain very accurate copy number results.
[000346] It is possible to derive the best estimate of the number of copies (H *) by maximizing the log probability of LIK data (D | H) over all the hypotheses (H) considered. In particular, it is possible to determine the relative probability of each ploidy hypothesis using the joint distribution model and the allele counts measured in the prepared sample, and using those relative probabilities to determine the most likely hypothesis to be correct, as follows: [000347] Similarly, the probability of a posteriori hypothesis provided in the data can be written as: [000348] Where priorprob (H) is the a priori probability attributed to each hypothesis H, based on the model project and previous knowledge.
[000349] It is also possible to use the previous ones to find the maximum a posteriori estimate: [000350] In one modality, the hypotheses of number of copies that can be considered are: • Monosomy: the maternal H10 (a copy of the mother) the H01patern (one copy of the father) • Disomy: H11 (one copy of the mother and father) • Simple trisomy, no crossover considered: o Maternal: H21_matched (two identical copies of the mother, one copy of the father), H21_unmatched (both copies of mother, one copy of the father) o Paterna: H12_matched (one copy of the mother, two identical copies of the father), H12_unmatched (one copy of the mother, both copies of the father) • Composite trisomy, allowing crosses (using a joint distribution model ): the maternal H21 (two copies of the mother, one of the father), the paternal H12 (one copy of the mother, two copies of the father) [000351] In other modalities, other ploidy states, such as nullisomy (H00), disomy uniparental (H20 and H02), and tetrasomy (H04, H13, H22, H31 and H40) can be considered.
[000352] If there are no crossings, each trisomy, whether the origin was mitosis, meiosis I or meiosis II, would be one among coincident or non-coincident trisomies. Due to crosses, true trisomy is usually a combination of the two. First, a method for deriving hypotheses from hypotheses to simple hypotheses is described.
Then, a method to derive probabilities from hypotheses to composite hypothesis is described, combining probability of individual SNP with crosses.
[000353] LIK (D | H) for a Simple Hypothesis [000354] In one embodiment, LIK (D | H) can be determined for simple hypotheses, as follows. For simple H, LIK (H) hypotheses, the log probability of hypothesis H on an integral chromosome, can be calculated as the sum of the log probabilities of individual SNPs, assuming fraction of the known or derived child cf. In one embodiment, it is possible to derive cf from the data.
[000355] This hypothesis does not assume any connection between SNPs, and so it does not use a joint distribution model.
[000356] In some embodiments, the log probability can be determined on a SNP basis. In a particular SNP i, assuming the hypothesis of fetal ploidy H and cf of percent fetal DNA, the log probability of observed data D is defined as: where m are possible genotypes of the true mother, f are possible genotypes of the true father, where m , fe {AA, AB, BB}, and c are possible genotypes of the child given hypothesis H. In particular, for monosomy ce {A, B}, for disomy ce {AA, AB, BB}, for trisomy ce {ΑΑΑ, ΑΑΒ, ΑΒΒ, ΒΒΒ}.
[000357] Previous frequency of the genotype: p (m | i) is the a priori general probability of the genotype of mother m in SNP i, based on the frequency of known population in SNP I, denoted pAi. In particular: p (AA pAi) = (pAí) 2, [000358] The probability of the father's genotype, p (f | i), can be determined in an analogous way.
[000359] The probability of true child: p (c m, f, H) is the probability of obtaining genotype of the true child = c, given the parents m, f, and assuming the hypothesis H, which can be easily calculated. For example, for H11, H21 coincident and H21 non-coincident, p (c | m, f, H) is given below.
[000360] Data probability: P (D m, f, c, H, i, cf) is the probability of D data provided in SNP i, given the genotype of the true mother m, the genotype of the true father f, the genotype of the true child c, hypothesis H and the fraction of the child cf. It can be divided into the mother, father and son data probability as follows: P (D m, f, c, H, cf, i) = P (SM m, i) P (M m, i) P (SF f, i) P (F f, i) P (S m, c, H, cf, i) [000361] Probability of mother's SNP matrix data: The probability of genotype data of matrix SNPs of mother mj in SNP i compared to the true genotype m, assuming that the genotypes of matrix SNP are correct, it is simply: [000362] Probability of mother sequence data: the probability of mother sequence data in SNP i, in the case of Si = (ami, bmi) counts, without extra noise or bias involved, is the binomial probability defined as P (SM | m, i) = PX | m (ami), where X | m ~ Binom ( pm (A), ami + bmi) with pm (A) defined as: [000363] Parent data probability: a similar equation applies for the parent data probability.
[000364] Note that it is possible to determine the child's genotype without the father's data, especially the father's data. For example, if no genotype data from parent F is available, you can just use P (F | f, i) = 1. If no sequence data from parent SF is available, you can only use P (SF | f, i) = 1.
[000365] In some modalities, the method involves building a joint distribution model for the expected allele counts in a plurality of polymorphic loci on the chromosome for each ploidy hypothesis; a method for carrying out such an end is described here. The probability of free fetal DNA data: P (S | m, c, H, cf, i) is the probability of free fetal DNA sequence data in SNP i, given the genotype of the true mother m, the genotype of the child true c, the child H copy hypothesis, and assuming the child fraction cf. It is in fact the probability of S sequence data in SNP I, given the true probability of A content in SNP ig (m, c, cf, H) [000366] For counts, where Si = (ai, bi), without extra noise or bias in the data involved, where X ~ Binom (p (A), ai + bi) with p (A) = g (m, c, cf, H). In a more complex case where the exact alignment and counts (A, B) by SNP are not known, P (S | g (m, c, cf, H), i) is a combination of integrated binomials.
[000367] The true probability of A's content: g (m, c, cf, H), the true probability of A's content in SNP i in this mother / child mixture, assuming the mother's genotype = m, the genotype of the true child = c, and the general child fraction = cf, is defined as: where #A (g) = number of A's in the g genotype, nm = 2 is the sum of the mother and nc is the child's ploidy under hypothesis H (1 for monosomy, 2 for disomy, 3 for trisomy).
[000368] Using the joint distribution model: LIK (D | H) for a Composite Hypothesis [000369] In some embodiments, the method involves building a joint distribution model for the expected allele counts in the plurality of polymorphic loci on the chromosome for each ploidy hypothesis; a method for carrying out such an end is described here. In many cases, trisomy is generally not purely coincident or not coincident, due to crossings, so in this section the results for composite hypotheses H21 (maternal trisomy) and H12 (paternal trisomy) are derived, which combine coincident and non-coincident trisomy , considering possible intersections.
[000370] In the case of trisomy, if there are no crossings, the trisomy would simply be coincident or non-coincident trisomy. Coincident trisomy is when the child inherits two copies of the identical chromosome segment from a parent. Non-coincident trisomy is when the child inherits a copy of each homologous chromosome segment from a parent. Due to crosses, some segments of a chromosome may have coincident trisomy, and other parts may have non-coincident trisomy. This section describes how to build a joint distribution model for heterozygosity rates for a set of alleles; that is, for the expected allele counts in a loci number for one or more hypotheses.
[000371] It is assumed that in SBNP i, LIK (D | Hm, i) is the fit for the coincident hypothesis Hm, and LIK (D | Hu, i) is the fit for the non-coincident hypothesis Hu, and pc ( i) = probability of crossing between SNPs i-1 and i. You can then calculate the total probability as: where LIK (D | E, 1: N) is the probability of ending in hypothesis E, for SNPs 1: N. E = hypothesis of the last SNP, E and (Hm, Hu). Recursively, one can calculate: where ~ E is the hypothesis other than E (not E), where the hypotheses considered are Hm and Hu. In particular, one can calculate the probability of 1: i SNPs, based on the probability of 1 to (i-1) SNPs or with the same hypothesis and without a cross, or with a different hypothesis and a cross, multiplied by the SNP i [ 000372] For SNP 1, i = 1, LIK (D | E, 1: 1) = LIK (D | E, 1).
[000373] For SNP2, i = 2, and so on for i = 3: N.
[000374] In some modalities, the child's fraction can be determined. The child's fraction can refer to the proportion of sequences in a mixture of DNA originating from the child. In the context of non-invasive prenatal diagnosis, the child's fraction can refer to the proportion of sequences in the maternal plasma that originate from the fetus or the part of the placenta with fetal genotype. It can refer to the fraction of a child in a DNA sample that was prepared from maternal plasma, and can be enriched in fetal DNA. One purpose of determining the child fraction in a DNA sample is for use in an algorithm that can make ploidy determinations in the fetus, so the child fraction could refer to any DNA sample that was analyzed by sequencing for the purpose non-invasive prenatal diagnosis.
[000375] Some of the algorithms presented in this description that are part of a method of diagnosis of non-invasive prenatal aneuploidy assume a fraction of a known child, which may not always be the case. In one modality, it is possible to find the most likely child fraction by maximizing the probability of disomy in selected chromosomes, with or without the presence of prenatal data.
[000376] In particular, it is assumed that LIK (D | H11, cf, chr) = probability log as described above, for the disomy hypothesis, and for the child fraction cf on the chr chromosome. For the chromosomes selected in Cset (usually 1:16), it is assumed to be euploid, the total probability is: [000377] The most likely child fraction (c / *) is derived as cf * = argmax LIK (cf). cf [000378] You can use any set of chromosomes. It is also possible to derive the child's fraction without assuming euploidy in the reference chromosomes. Using this method, it is possible to determine the child fraction for any of the following situations: (1) there is the matrix data in the parents and the shotgun sequencing data in the maternal plasma; (2) there are the matrix data in the parents and the targeted sequencing data in the maternal plasma; (3) sequencing data is targeted at both parents and maternal plasma; (4) sequencing data is directed at both the mother and the fraction of maternal plasma; (5) we have the sequencing data directed to the fraction of maternal plasma; (6) other combinations of parental and child fraction measurements.
[000379] In some modalities, the computer-based method may incorporate data exclusions; this can result in more accurate ploidy determinations. Elsewhere in this description it can be assumed that the probability of obtaining an A is a direct function of the genotype of a true mother, the genotype of a true child, the fraction of the child in the mix, and the number of copies of the child. It is also possible that the alleles of the mother and the child can be excluded, for example, instead of measuring the true child AB in the mixture, it may be the case that only the sequences mapping to the A allele are measured. One can denote the parental exclusion rate for genomic data lights dpg, parental exclusion rate for dps sequence data and the child exclusion rate for dcs sequence data. In some modalities, the mother's exclusion rate can be assumed to be zero, and the child exclusion rates are relatively low; in this case, the results are not severely affected by exclusions. In some embodiments, the possibility of allele exclusions may be large enough to result in a significant effect of predicted ploidy determination. For such a case, allele exclusions have been incorporated into the algorithm here.
[000380] Exclusions of SNP matrix data from parents: For the genomic data of mother M, it is assumed that the genotype after exclusion is md, then: probability of genotype md after possible exclusion given the true genotype m, defined as below, for the exclusion rate d [000381] A similar equation applies for the parent's SNPs array data.
[000382] Parental sequence data exclusions: For the mother sequence data SM where P (md | m) is defined as in the previous section and PX | m (amj) probability from a binomial distribution is defined as previously in the parent data probability section. A similar equation applies for paternal sequence data.
[000383] Exclusion of free DNA sequence data: where P (S ^ (md, cd, cf, H), i) is as defined in the free data probability section.
[000384] In one modality, p (md m) is the probability of the observed mother's genotype md, the true mother's genotype given m, assuming exclusion rate dps, ep (cd c) is the probability of the child's genotype observed cd, the given genotype of the true child c, assuming the dcs exclusion rate. If nAT = number of A alleles in the true genotype c, nAD = number of A alleles in the observed genotype cd, where nAT nAD, and similarly nBT = number of B alleles in the true genotype c, nBD = number of B alleles in the observed genotype cd , where nBT nBD and d = exclusion rate, then [000385] In one mode, the computer-based method can incorporate consistent and random bias. In an ideal word, there is no consistent sampling bias by SNP or random noise (in addition to the variation in binomial distribution) in the number of sequence counts. In particular, in SNP i, for the genotype of mother m, the genotype of the true child and the fraction of the child cf, and X = the number of A's in the set of (A + B) readings in SNP i, X acts as an X ~ Binomial (p, A + B), where p = p (m, c, cf, H) = the true probability of A's content.
[000386] In one embodiment, the computer-based method can incorporate random bias. As is often the case, it is assumed that there is a bias in the measurements, so that the probability of getting an A in this SNP is equal to q, which is a bit other than p as defined above. How different p is from q depends on the accuracy of the measurement process and the number of other factors and can be quantified by standard deviations of q from p. In one embodiment, it is possible to model q as having a beta distribution, with parameters α, β depending on the mean of that distribution being centered on p, and some specified standard deviation s. In particular, this provides X q ~ Bin (q, Di), where q ~ Beta (a, β). If E (q) = p, V (q) = s2 is assumed, and the parameters α, β can be derived as α = ρΝ, β = (1 - p) N, where N = p (1 ~ p) - 1.
[000387] This is the definition of a beta-binomial distribution, where sampling is done from a binomial distribution with variable parameter q, where q follows the beta distribution with average p. Then, in a bias-free configuration, in SNP i, the probability of the sequence data of the parents (SM) assuming the genotype of the true mother (m), given the count of A's in the sequence of the mother in the SNP i (ami) and count of B's in the sequence of the mother in SNP i (bmi), can be calculated as: P (SM | m, i) = PX | m (ami) where X | m ~ Binom (pm (A), ami + bmi) [000388] Now, including the random bias with the standard deviation s, it becomes: X | m ~ BetaBinom (pm (A), ami + bmi, s) [000389] In the case without bias, the probability of sequence data of Maternal plasma DNA (S) assuming the genotype of the true mother (m), the genotype of the true child (c), the fraction of the child (cf), assuming the hypothesis of the child H, given the count of A's in the free DNA sequence in SNP i (ai) and the count of B's in the free sequence in SNP i (bi) can be calculated as: P (S m, c, cf, H, i) = Px (aj) where X ~ Binom (p (A), ai + bi) with p (A) = g (m, c, cf, H).
[000390] In a modality, including random bias with standard deviation s, it becomes X ~ BetaBinom (p (A), ai + bi, s), where the amount of extra variation is specified by the deviation parameter s, or equivalent N. The lower the value of s (or the higher the value of N), the closer this distribution is to the regular binomial distribution. It is possible to estimate the amount of bias, that is, to estimate N above, from unambiguous contexts AA | AA, BB | BB, AA | BB, BB | AA and use N estimated in the above probability. Depending on the behavior of the data, N can be made constant regardless of the reading depth ai + bi, or a function of ai + bi, making the bias smaller for greater reading depths.
[000391] In one embodiment, the computer-based method can incorporate bias by consistent SNP. Due to artifacts from the sequencing process, some SNPs may have consistently lower or higher counts regardless of the true amount of A 'content. It is assumed that the SNP i consistently adds a percentage bias of wi to the number of A's counts. In some modalities, the bias can be estimated from the set of training data derived under the same conditions, and added back to the parent sequence data estimate such as: P (SM | m, i) = PX | m ( ami) where X | m ~ BetaBinom (pm (A) + Wi, ami + bmi, s) and with the probability estimate of free DNA sequence data such as: P (S | m, c, cf, H, i ) = Px (aj) where X ~ BetaBinom (p (A) + wi, ai + bi, s), [000392] In some modalities, the method can be written to specifically take into account additional noise, quality of the differential sample, differential SNP quality, and random sampling bias. An example of this is given here. This method proved to be particularly useful in the context of data generated using the massively multiplexed mini-PCR protocol, and was used in Experiments 1 to 13. The method involves several steps that introduce different types of noise and / or bias to the final model: ( 1) It is assumed that the first sample comprising a mixture of maternal and fetal DNA contains an original amount of DNA of size = NO0 molecules, usually in the range of 1,000 to 40,000, where p =% true refs. (2) In amplification using universal link adapters, it is assumed that N1 molecules are sampled, usually N1 ~ N0 / 2 molecules and the random sampling bias is introduced due to sampling. The amplified sample can contain a number of N2 molecules where N2 >> N1. X1 represents the amount of reference loci (on a per SNP basis) out of N1 molecules sampled, with a variation in p1 = X1 / N1 that introduces the random sampling bias throughout the rest of the protocol. This sampling bias is included in the model using a Beta-Binomial (BB) distribution instead of using a simple Binomial distribution model. The N parameter of the Beta-Binomial distribution can be estimated later on a per-sample basis from training data after adjusting the leakage and amplification bias, in SNPs with 0 <p <1. Leakage is the tendency that a SNP be read incorrectly. (3) The amplification step will amplify any allelic bias, thus the allelic bias introduced due to possible irregular amplification. It is assumed that one allele in one locus is amplified f times and another allele in that locus is amplified g times, where f = g and g, where b = 0 does not indicate bias. The bias parameter, b, is centered at 0, and indicates how more or less allele A is amplified as opposed to allele B in a particular SNP. Parameter b may differ from SNP to SNP. The bias parameter b can be estimated on a per SNP basis, for example, from the training data. (4) The sequencing step involves sequencing a sample of amplified molecules. At this stage, there may be a leak, where the leak is the situation in which an SNP is read incorrectly. The leak can result from any number of problems, and can result in an SNP being read not as the correct allele A, but as another B allele found at the locus or a C or D allele not typically found at that locus. It is assumed that sequencing measures the sequence data for a number of DNA molecules from an amplified sample of size N3, where N3 <N2. In some modalities, N3 can be in the range of 20,000 to 100,000, 100,000 to 500,000, 500,000 to 4,000,000, 4,000,000 to 20,000,000 or 20,000,000 to 100,000,000. Each sampled molecule has a pg probability of being read correctly, in which case it will show up correctly as allele A. The sample will be read incorrectly as an allele unrelated to the original molecule with the 1-pg probability, and will look like allele A with probability pr, allele B with pm probability or allele C or allele D with probability po, where pr + pm + po = 1. The parameters pg, pr, pm, po are estimated on a SNP basis from the data training.
[000393] Different protocols may involve similar steps with variations in the molecular biology steps resulting in different amounts of random sampling, different levels of amplification and different leakage bias. The following model can be applied equally well to each of these cases. The model for the amount of DNA sampled, on a SNP basis, is given by: X3 ~ BetaBinomial (L (F (p, b), pr, pg), N * H (p, b)) where p = a true amount of reference DNA, b = SNP bias, and as described above, pg is the probability of a correct reading, pr is the probability of reading being read incorrectly, but unexpectedly looking like the correct allele, in the case of a bad reading , as described above, and: F (p, b) = peb / (peb + (1-p)), H (p, b) = (ebp + (1-p)) 2 / eb, L (p, pr, pg) = p * pg + pr * (1-pg).
[000394] In some modalities, the method uses a Beta-Binomial distribution instead of a simple binomial distribution, this takes care of the random sampling bias. The N parameter of the betabinomial distribution is estimated on a per-sample basis on a basis as needed. Using bias correction F (p, b), H (p, b), instead of just p, takes care of the amplification bias. The bias parameter b is estimated on a per SNP basis from the training data ahead of time. [000395] In some modalities, the method uses leakage correction L (p, pr, pg), instead of just p; this takes care of the leakage bias, that is, varying SNP and the quality of the sample. In some modalities, the parameters pg, pr, po are estimated on a per SNP basis from the training data ahead of time. In some modalities, the parameters pg, pr, po can be updated with the current sample in operation, to explain the quality of the variable sample.
[000396] The model described here is very general and can explain both the quality of the differential sample and the quality of the differential SNP. The different samples and SNPs are treated differently, as exemplified by the fact that some modalities use beta-binomial distributions whose mean and variance are a function of the original amount of DNA, as well as the quality of the sample and the SNP. Platform Modeling [000397] A single SNP is considered where the expected allele ratio present in the plasma is r (based on the maternal and fetal genotypes). The expected allele ratio is defined as the expected fraction of A alleles in the combined maternal and fetal DNA. For the maternal genotype gm and the genotype of the child gc, the expected allele relationship is given by equation 1, assuming that the genotypes are represented as the allele relationships as well.
[000398] The observation in the SNP consists of the number of readings mapped with each allele present, na and nb, which adds to the reading depth d. It is assumed that the limits have already been applied to the mapping probabilities and phred scores such that the mappings and allelic observations can be considered correct. A phred score is a numerical measure that is related to the likelihood that a particular measurement on a particular basis is wrong. In one embodiment, when the base was measured by sequencing, the phred score can be calculated from the ratio of the dye intensity corresponding to the base determined to the dye intensity of the other bases. The simplest model for the observation probability is a binomial distribution that assumes that each of the readings is removed regardless of a large group that has a r allele relationship. Equation 2 describes this model.
[000399] The binomial model can be extended in several ways. When the maternal and fetal genotypes are either all A or all B, the expected allele ratio in the plasma will be 0 or 1, and the binomial probability will not be well defined. In practice, unexpected alleles are sometimes observed in practice. In one embodiment, it is possible to use a corrected allele ratio f = 1 / (na + nb) to allow for a small number of unexpected alleles. In one embodiment, it is possible to use training data to model the rate of unexpected alleles that appear in each SNP, and use this model to correct the expected allele ratio. When the expected allele ratio is not 0 or 1, the observed allele ratio may not converge with a reading depth high enough for the expected allele ratio due to amplification bias or other phenomenon. The allele relationship can then be modeled as a beta distribution centered on the expected allele relationship, leading to a beta-binomial distribution for P (na, nb | r) that has greater variance than the binomial.
[000400] The platform model for the response in a single SNP will be defined as F (a, b, gc, gm, f) (3), or the probability of observing na = a and nb = b given the maternal and fetal genotypes , which also depends on the fetal fraction through equation 1. The functional form of F can be a binomial distribution, beta-binomial distribution, or similar functions as discussed above.
[000401] In one embodiment, the child's fraction can be determined as follows. A maximum likelihood estimate of the fetal fraction f for a prenatal test can be derived without using prenatal information. This can be relevant when the paternal genetic data is not available, for example, when the father of record is not actually the genetic father of the fetus. The fetal fraction is estimated from the set of SNPs when the maternal genotype is 0 or 1, resulting in a set of only two possible fetal genotypes. Define S0 as the set of SNPs with maternal genotype 0 and S1 as the set of SNPs with maternal genotype 1. The possible fetal genotypes in S0 are 0 and 0.5, resulting in a set of possible R0 (f ) = {0, f / 2}. Similarly, R1 (f) = {1-f / 2, 1}. This method can be trivially extended to include SNPs where the maternal genotype is 0.5, but these SNPs will be less informative than the largest set of possible allele relationships.
[000402] Define Na0 and Nb0 as the vectors formed by nas and nbs for SNPs are in S0, and Na1 and Nb1 similarly for S1. The maximum likelihood estimate f of f is defined by equation 4.
[000403] Assuming that the allele counts in each SNP are independently conditioned on the allele ratio in the SNP plasma, the probabilities can be expressed as products over the SNPs in each set (5).
[000404] The dependence on f is through the sets of possible allele relationships R0 (f) and R1 (f). The probability of SNP P (nas, nbs | f) can be approximated by assuming the maximum likelihood genotype conditioned on f. At a reasonably high fetal fraction and reading depth, the selection of the maximum likelihood genotype will be highly reliable. For example, at a fetal fraction of 10 percent and a reading depth of 1000, a SNP is considered where the mother has zero genotype. The expected allele ratios are 0 and 5 percent, which will be easily distinguishable at sufficiently high reading depth. The replacement of the child genotype estimated in equation 5 results in the complete equation (6) for the fetal fraction estimate.
[000405] The fetal fraction needs to be in the range [0, 1] and so the optimization can be easily implemented by a restricted one-dimensional search.
[000406] In the presence of low reading depth or high noise level, it may be preferable not to assume the maximum likelihood genotype, which can result in artificially high confidence. Another method would be to add the possible genotypes in each SNP, resulting in the following expression (7) for P (na, nb | f) for an SNP in S0. The a priori probability P (r) could be assumed uniform over R0 (f), or it could be based on population frequencies. The extension for the S1 group is trivial. (7) [000407] In some embodiments, the probabilities can be derived as follows. A confidence can be calculated from the data probabilities of the two hypotheses Ht and Hf. The probability of each hypothesis is derived based on the response model, estimated fetal fraction, mother's genotypes, allele population frequencies, and plasma allele counts.
[000408] The following notation is defined: Gm, Gc true maternal genotype and true child genotype Gaf, Gtf true genotypes of alleged alleged father and true father inheritance probabilities P (g) = P (Gtf = g) frequency of genotype g population in particular SNP [000409] Assuming that observation in each SNP is independent conditioned on the plasma allele ratio, the probability of a paternity hypothesis is the product of the probabilities in the SNPs. The following equations derive the probability for a single SNP. Equation 8 is a general expression for the probability of any hypothesis h, which will then be divided into the specific cases of Ht and Hf.
[000410] In the case of Ht, the alleged father is the real father and the fetal genotypes are inherited from the maternal genotypes and the genotypes of the alleged father according to equation 9.
[000411] In the case of Hf, the alleged father is not the real father. The best estimate of the genotypes of the real father is given by the population frequencies in each SNP. Thus, the probabilities of child genotypes are determined by the mother's known genotypes and population frequencies, as in equation 10.
[000412] The Cp confidence in the correct paternity is calculated from the product over SNPs of the two probabilities using the Bayes rule (11).
Maximum likelihood model using fetal percentage fraction [000413] Determining the status of a fetus' ploidy by measuring the free DNA contained in the maternal serum, or by measuring the genotypic material in any mixed sample, is a non-trivial exercise. There are a number of methods, for example, performing a reading count analysis where the hypothesis is that if the fetus is trisomal on a particular chromosome, then the total amount of DNA from that chromosome found in maternal blood will be high with respect to a reference chromosome. One way to detect trisomy in such fetuses is to normalize the amount of DNA expected for each chromosome, for example, according to the number of SNPs in the analysis set that corresponds to a given chromosome, or according to the number of exclusively mappable parts chromosome. Once the measurements have been normalized, any of the chromosomes for which the amount of DNA measured exceeds a certain limit is determined to be trisomic. This approach is described in Fan, et al., PNAS, 2008; 105 (42); p. 16266 to 16271, and also in Chiu et al., BMJ 2011; 342: c7401. In the publication by Chiu and others, normalization was performed by calculating a Z score as follows: [000414] Z score for the percentage of chromosome 21 in the test case = (((percentage of chromosome 21 in the test case) - (percentage average of chromosome 21 in reference controls)) / (standard deviation of percentage of chromosome 21 in reference controls).
[000415] These methods determine the ploidy state of the fetus using a single hypothesis rejection method. However, they suffer from some significant disadvantages. Since these methods for determining ploidy in the fetus are invariant according to the percentage of fetal DNA in the sample, they use a cutoff value; the result of this is that the accuracy of the determinations is not optimal, and those cases where the percentage of fetal DNA in the mixture is relatively low will suffer from the worst precision.
[000416] In one embodiment, a method of the present description that is used to determine the fetal ploidy state takes into account the fraction of fetal DNA in the sample. In another embodiment of the present description, the method involves the use of maximum likelihood estimates. In one embodiment, a method of the present description involves calculating the percentage of DNA in a sample that is of fetal or placental origin. In one embodiment, the threshold for determining aneuploidy is adaptably adjusted based on the calculated percentage fetal DNA. In some modalities, the method for estimating the percentage of DNA that is of fetal origin in a DNA mixture, comprises obtaining a mixed sample comprising genetic material from the mother, and genetic material from the fetus, obtaining a genetic sample from the father of the fetus, measure the DNA in the mixed sample, measure the DNA in the father's sample, and calculate the percentage of DNA that is of fetal origin in the mixed sample using the DNA measurements from the mixed sample, and from the father's sample.
[000417] In one embodiment of the present description, the fraction of fetal DNA, or the percentage of fetal DNA in the mixture can be measured. In some modalities, the fraction can be calculated using only the genotyping measurements made on the maternal plasma sample itself, which is a mixture of fetal and maternal DNA. In some embodiments, the fraction can also be calculated using the measured genotype or otherwise known to the mother and / or the measured genotype or otherwise known to the father. In some embodiments, the percentage fetal DNA can be calculated using measurements made on the mixture of maternal and fetal DNA together with knowledge of parental contexts. In one embodiment, the fetal DNA fraction can be calculated using population frequencies to adjust the model in probability in measurements of particular alleles.
[000418] In one embodiment of the present description, a confidence can be calculated on the accuracy of determining the ploidy status of the fetus. In one modality, the confidence of the most likely hypothesis (Hmaior) can be calculated as (1 - Hmaior) / Z (all H). It is possible to determine the confidence of a hypothesis if the distributions of all the hypotheses are known. It is possible to determine the distribution of all the hypotheses if the parental genotype information is known. It is possible to calculate the confidence of the ploidy determination if the expected data distribution for the euploid fetus and the expected data distribution for the aneuploid fetus are known. In one embodiment, knowledge of the distribution of a test statistic around a normal hypothesis and around an abnormal hypothesis can be used to determine both the reliability of the determination as well as refining the threshold to make the determination more reliable. This is particularly useful when the amount and / or the percentage of fetal DNA in the mixture is low. This can help to avoid the situation where a fetus that is really aneuploid turns out to be euploid because a test statistic, like the Z statistic, does not exceed a limit that is made based on a limit that is optimized for the case where there is a higher percentage of fetal DNA.
[000419] In one embodiment, a method described here can be used to determine fetal aneuploidy by determining the number of copies of target maternal and fetal chromosomes in a mixture of maternal and fetal genetic material. This method can check to obtain maternal tissue comprising maternal and fetal genetic material; in some embodiments, this maternal tissue may be maternal plasma or an isolated maternal blood tissue. This method can also provide obtaining a mixture of maternal and fetal genetic material from said maternal tissue by processing the maternal tissue mentioned above. This method can check to distribute the genetic material obtained in a plurality of reaction samples, supply the individual reaction samples that comprise a target sequence from the target chromosome and individual reaction samples that do not comprise a target sequence from the target chromosome, for example, perform high throughput sequencing on the sample. This method can provide analysis of the target sequences of genetic material present or absent in said individual reaction samples to provide a first number of binary results representing the presence or absence of a presumed euploid fetal chromosome in the reaction samples and a second number of binary results representing the presence or absence of a possibly aneuploid fetal chromosome in the reaction samples. Any of the number of binary results can be calculated, for example, using a computer technique that counts the sequence readings that map to a particular chromosome, to a particular region on a chromosome, to a particular locus or set of loci . This method may involve normalizing the number of binary events based on the length of the chromosome, the length of the chromosome region, or the number of loci in the set. This method can check to calculate an expected distribution of the number of binary results for a presumed euploid fetal chromosome in the reaction samples using the first number. This method can check to calculate an expected distribution of the number of binary results for a presumed aneuploid fetal chromosome in the reaction samples using the first number and an estimated fraction of fetal DNA found in the mixture, for example, by multiplying the reading count distribution expected number of binary results for a presumed euploid fetal chromosome by (1 + n / 2), where n is the estimated fetal fraction. In some modalities, sequence readings can be treated in probabilistic mappings instead of binary results; this method would result in higher accuracy, but it requires more computing power. The fetal fraction can be estimated by a plurality of methods, some of which are described in this description. This method may involve using a maximum likelihood approach to determine whether the second number corresponds to the possibly aneuploid fetal chromosome being euploid or aneuploid. This method may involve determining the ploidy state of the fetus as the ploidy state that corresponds to the hypothesis with the maximum probability of being correct provided the measured data.
[000420] It is noted that the use of a maximum likelihood model can be used to increase the accuracy of any method that determines the ploidy state of a fetus. Similarly, confidence can be calculated for any method that determines the ploidy status of the fetus. The use of a maximum likelihood model would result in an improvement in the accuracy of any method where ploidy determination is made using a single hypothesis rejection technique. A maximum likelihood model can be used for any method, where a probability distribution can be calculated for both normal and abnormal cases. The use of a maximum likelihood model gives the ability to calculate the confidence for a ploidy determination.
Additional discussion of the method [000421] In one embodiment, a method described here uses a quantitative measure of the number of independent observations of each allele at a polymorphic locus, where it does not involve calculating the allele ratio. This is different from methods, such as some micromatrix-based methods, which provide information about the relationship of two alleles at a locus, but do not quantify the number of independent observations for each allele. Some methods known in the art can provide quantitative information regarding the number of independent observations, but the calculations leading to the determination of ploidy use only allele relationships, and do not use quantitative information. To illustrate the importance of retaining information on the number of independent observations, the sample locus is considered to have two alleles, A and B. In a first experiment, twenty A alleles and twenty B alleles are observed, in a second experiment, 200 alleles A and 200 alleles B are observed. In both experiments, the ratio (A / (A + B)) is equal to 0.5, however, the second experiment carries more information than the first around the certainty of the frequency of the A or B allele. The present method , instead of using allele relationships, it uses quantitative data to more precisely model the most likely allele frequencies at each polymorphic locus.
[000422] In one embodiment, the present methods build a genetic model to aggregate measurements from multiple polymorphic loci to better distinguish trisomy from disomy and also determine the type of trisomy. In addition, the present method incorporates genetic link information to improve the accuracy of the method. This is in contrast to some methods known in the art where allele relationships are weighted across all polymorphic loci on a chromosome. The method described here explicitly models the expected allele frequency distributions in disomy, as well as trisomy resulting from non-disjunction during meiosis I, non-disjunction during meiosis II, and non-disjunction during early mitosis in fetal development. To illustrate why this is important, if there are no crosses, non-disjunction during meiosis I would result in a trisomy in which two different counterparts were inherited from a parent; failure to disjunction during meiosis II or during early mitosis in fetal development would result in two copies of the same homologue of a parent. Each scenario results in different allele frequencies expected in each polymorphic locus and also in all physically linked loci (ie, loci on the same chromosome) considered together. Crosses, which result in the exchange of genetic material between counterparts, make the pattern of inheritance more complex, but the present method adapts to this using genetic link information, that is, information on recombination rate and physical distance between loci. In order to better distinguish between non-disjunction during meiosis I and non-disjunction during meiosis II or mitotic, the present method incorporates an increasing probability of crossing into the model as the distance from the centromere increases. Non-disjunction during meiosis II or mitotic can be distinguished by the fact that mitotic non-disjunction typically results in identical or nearly identical copies of a homolog, while the two homologues present following an event of non-disjunction during meiosis II often differ due one or more crosses during gametogenesis.
[000423] In one embodiment, a method of the present description may not determine the parent's haplotypes if disomy is assumed. In one embodiment, in the case of trisomy, the present method can make a determination on the haplotypes of one or both parents using the fact that the plasma obtains two copies from a parent, and the parental phase information can be determined by noting which two copies were inherited from the parent in question. In particular, a child can inherit either two of the same copies from the parent (coincident trisomy) or both copies of the parent (non-coincident trisomy). In each SNP, the probability of coincident and non-coincident trisomy can be calculated. A method of determining ploidy that does not use the link model considering crosses would calculate the general probability of trisomy as a simple weighted average of coincident and non-coincident trisomies over all chromosomes. However, due to the biological mechanisms that result in disjunction and crossing errors, trisomy can change from coincident to non-coincident (and vice versa) on a chromosome only if a crossing occurs. The present method takes into account the probability of crossing, resulting in ploidy determinations that are more accurate than those methods that do not.
[000424] In one embodiment, a reference chromosome is used to determine the child's fraction and the amount of noise level or probability distribution. In one embodiment, the child's fraction, noise level and / or probability distribution is determined using only the genetic information available from the chromosome whose ploidy status is being determined. The present method works without the reference chromosome, as well as without fixing the fraction of a particular child or the noise level. This is a significant improvement and point of differentiation from methods known in the art where genetic data from a reference chromosome are needed to calibrate the child's fraction and chromosome behavior.
[000425] In a modality where a reference chromosome is not necessary to determine the fetal fraction, the hypothesis is determined as follows: H * = argmaxLIK (D | H) * priorprob (H) H
[000426] With the reference chromosome algorithm, it is typically assumed that the reference chromosome is a disomy, and then you can either (a) fix the most likely child fraction and the random noise level N based on that hypothesis and the reference chromosome data: [cfr *, N *] = argmax LIK (D (ref. chrom) | H11, cfr, N) cfr, N
[000427] And then reduce LIK (D | H) = LIK (D | H, cfr *, N *) or (b) estimate the child's fraction and noise level distribution based on this hypothesis and chromosome data reference. In particular, not only would a value be set for cfr and N, but the probability p (cfr, N) would be assigned to the widest range of possible values for cfr, N: p (cfr, N) ~ LIK (D (ref chrom) | H11, cfr, N) * priorprob (cfr, N) where priorprob (cfr, N) is the a priori probability of a particular child's fraction and noise level, determined by prior knowledge and experiments. If desired, just uniform across the range of cfr, N. You can then write: LIK (D | H) = LIK (D | H, cfr, N) * p (cfr, N) cfr, N
[000428] Both of the above methods provide good results.
[000429] It is noted that, in some cases, using a reference chromosome is not desirable, possible or feasible. In such a case, it is possible to derive the best ploidy determination for each chromosome separately. In particular: LIK (D | H) = LIK (D | H, cfr, N) * p (cfr, N | H) cfr, N p (cfr, N | H) can be determined as above, for each chromosome separately , assuming hypothesis H, not only for the reference chromosome assuming disomy. If possible, using this method, to maintain both fixed noise parameters and child fraction, fix any of the parameters, or keep both parameters in probabilistic form for each chromosome and each hypothesis.
[000430] DNA measurements are noisy and / or error-prone, especially measurements where the amount of DNA is small, or where DNA is mixed with contaminating DNA. This noise results in less accurate genotypic data, and less accurate ploidy determinations. In some embodiments, platform modeling or some other noise modeling method can be used to count the harmful effects of noise in determining ploidy. The present method uses a joint model of both channels, which explains the random noise due to the amount of incoming DNA, DNA quality, and / or protocol quality.
[000431] This is in contrast to some methods known in the art where ploidy determinations are made using the intensity ratio of alleles at a locus. This method eliminates accurate SNP noise modeling. In particular, errors in measurements typically do not depend specifically on the measured channel intensity ratio, which reduces the model using one-dimensional information. Accurate modeling of noise, channel quality and channel interaction requires a two-dimensional joint model, which cannot be modeled using allele relationships.
[000432] In particular, projecting information from two channels into the relation r where f (x, y) is r = x / y, does not make it accurate channel noise and bias modeling. The noise in a particular SNP is not a function of the relationship, that is, noise (x, y) # = f (x, y), but it is, in fact, a joint function of both channels. For example, in the binomial model, the noise of the measured ratio has a variance of r (1-r) / (x + y) that is not a purely function of r. In such a model, when any channel bias or noise is included, it is assumed that in SNP i, the observed X channel value is x = aiX + bi, where X is the true channel value, bi is the channel bias extra and random noise. Similarly, y = ciY + di is assumed. The observed ratio r = x / y may not accurately predict the true X / Y ratio or model the remaining noise, since (aiX + bi) / (ciY + di) is not a function of X / Y.
[000433] The method described here describes an effective way to model noise and bias using joint binomial distributions of all measurement channels individually. The relevant equations can be found in the document in sections that describe bias consistent across SNP, P (good) and P (ref | bad), P (mut | bad) that effectively adjusts SNP behavior. In one embodiment, a method of the present description uses a binomial beta distribution that avoids the limiting practice of relying only on allele relationships, but instead models the behavior based on both channel counts.
[000434] In one embodiment, a method described here can determine the ploidy of a fetus in gestation from the genetic data found in maternal plasma using all available measurements. In one embodiment, a method described here can determine the ploidy of a fetus in gestation from the genetic data found in maternal plasma using measurements from only a subset of parental contexts. Some methods known in the art use only the measured genetic data where the parental context is from the AA | BB context, that is, where the parents are both homozygous at a given locus, but for a different allele. A problem with this method is that a small proportion of polymorphic loci is from the AA BB context, typically less than 10%. In one embodiment of a method described here, the method does not use genetic measurements of maternal plasma made at a loci where the parental context is AA | BB. In one embodiment, the present method uses plasma measurements for only those polymorphic loci with the parental context AA | AB, AB | AA, and AB | AB.
[000435] Some methods known in the art involve the average of allele relationships from SNPs in the AA | BB context, where both parent genotypes are present, and claim to determine ploidy from the average allele relationship in these SNPs. This method suffers from significant inaccuracy due to the differential behavior of the SNP. Note that this method assumes that both parents' genotypes are known. In contrast, in some modalities, the present method uses a joint channel distribution model that does not assume the presence of either parent, and does not assume the uniform behavior of the SNP. In some embodiments, the present method explains the different SNP behavior / weight. In some embodiments, the present method does not require knowledge of one or both parental genotypes. An example of how the present method can accomplish this follows: [000436] In some embodiments, the log probability of a hypothesis can be determined on a SNP basis. In a particular SNP i, assuming the hypothesis of fetal ploidy H and the fetal percentage DNA cf, the probability log of observed data D is defined as: LIK (D | H, i) = logP (D | H, cf, i) = log (VP (D | m, f, c, H, cf, i) P (c | m, f, H) P (m | i) P (f | i)) [000437] Where m are possible genotypes of the true mother f are possible genotypes of the true father, where m, fe {AA, AB, BB}, where c are possible genotypes of the child given hypothesis H. In particular, for monosomy c {Α, β}, for ce {ΑΑ, Αβ, ββ}, for trisomy ce {ΑΑΑ, ΑΑβ, Αββ, βββ}. Note that including parental genotypic data typically results in more accurate ploidy determinations, however, parental genotypic data is not necessary for this method to work well.
[000438] Some methods known in the art involve the average of allele relationships from SNPs where the mother is homozygous, but a different allele is measured in the plasma (or AA | AB or AA | BB context), and claim to determine the ploidy from the mean allele ratio in these SNPs. This method is intended for cases where the paternal genotype is not available. It is noted that it is questionable how precisely it can be claimed that the plasma is heterozygous in a particular SNP without the presence of a homozygous father and BB: for cases with a low fraction of the child, what seems that the presence of B allele could be only the presence of noise; additionally, what appears to be that no B is present could be the simple exclusion of allele from fetal measurements. Even in a case where plasma heterozygosity can be determined, this method will not be able to distinguish paternal trisomies. In particular, for SNPs where the mother is AA, and where some B is measured in the plasma, if the father is GG, the resulting child genotype is AGG, resulting in an average ratio of 33% A (for the child fraction = 100%). But in the case where the father is AG, the resulting child's genotype could be AGG for coincident trisomy, contributing to the 33% A ratio, or AAG for non-coincident trisomy, directing the mean ratio more to 66% A. Given that many trisomies are on chromosomes with crosses, the total chromosome can have anything between coincident and non-coincident trisomy, this relationship can vary between 33 and 66%. For flat disomy, the ratio should be around 50%. Without the use of a connection model or an accurate mean error model, this method would miss many cases of paternal trisomy. In contrast, the method discussed here assigns parental genotype probabilities to each parental genotype candidate, based on available genotype information and population frequency, and does not explicitly require parental genotypes. Additionally, the method described here is capable of detecting trisomy even in the absence or presence of parental genotypic data, and can compensate by identifying the points of possible crossings from coincident to non-coincident trisomy using a link model. [000439] Some methods known in the art claim a method for averaging allele ratios from SNPs where neither the maternal nor the paternal genotype is known, and for determining ploidy determinations from the mean ratio in those SNPs. However, a method for performing these purposes is not described. The method described here is capable of making ploidy determinations accurate in such a situation, and the reduction for practice is described in this document, using a method of maximum joint likelihood and optionally uses SNP bias and noise models, as well as a model of SNP noise. Link.
[000440] Some methods known in the art involve the average of allele ratios and claim to determine ploidy from the mean allele ratio in one or a few SNPs. However, such methods do not use the concept of linking. The methods described here do not suffer from these disadvantages.
[000441] Using sequence length as a preview to determine the origin of DNA
[000442] It has been reported that the sequence length distribution differs for maternal and fetal DNA, with the fetal being generally shorter. In one embodiment of the present description, it is possible to use prior knowledge in the form of empirical data, and to construct the a priori distribution for the expected length of both maternal DNA (P (X | maternal)) and fetal DNA (P (X | fetal) Given the new unidentified DNA sequence of length x, it is possible to assign a probability that a given DNA sequence is either maternal or fetal DNA, based on the a priori probability of x given or maternal or fetal DNA. if P (x | maternal)> P (x | fetal), then the DNA sequence can be classified as maternal, with P (x | maternal) = P (x | maternal) / [(P (x | maternal) + P (x | fetal)], and if p (x | maternal) <p (x | fetal), then the DNA sequence can be classified as fetal, P (x | fetal) = P (x | fetal) / [ (P (x | maternal) + P (x | fetal)]. In one embodiment of the present description, a distribution of maternal and fetal sequence lengths can be determined to be specific for that sample considering the sequences which can be assigned as maternal or fetal with high probability, and then this specific sample distribution can be used as the expected size distribution for that sample.
[000443] Variable reading depth to minimize the cost of sequencing [000444] In many clinical examinations regarding a diagnosis, for example, in Chiu et al., BMJ 2011; 342: c7401, a protocol with a number of parameters is configured , and then the same protocol is performed with the same parameters for each of the patients in the exam. In the case of determining the ploidy status of a fetus in gestation in a mother using sequencing as a method to measure genetic material, a relevant parameter is the number of readings. The number of readings can refer to the number of actual readings, the number of intended readings, fractional plans, complete plans, or full flow cells in a sequencer. In these studies, the number of readings is typically set at a level that will ensure that all or almost all samples reach the desired level of accuracy. Sequencing is currently an expensive technology, costing approximately $ 200 for 5 million mappable readings, and while the price is falling, any method that allows a sequencing-based diagnosis to operate at a similar level of accuracy, but with fewer readings necessarily will save a considerable amount of money.
[000445] The accuracy of a ploidy determination is typically dependent on several factors, including the number of readings and the fraction of fetal DNA in the mixture. Accuracy is typically greatest when the fraction of fetal DNA in the mixture is highest. At the same time, accuracy is typically greater if the number of readings is greater. It is possible to have a situation with two cases where the ploidy state is determined with comparable precision where the first case has a lower fraction of fetal DNA in the mixture than the second, and more readings have been sequenced in the first case than in the second. It is possible to use the estimated fraction of fetal DNA in the mixture as a guide in determining the number of readings needed to achieve a given level of accuracy.
[000446] In one embodiment of the present description, a set of samples can be run where different samples in the set are sequenced to different reading depths, where the number of readings performed on each of the samples is chosen to achieve a given level of accuracy given the calculated fraction of fetal DNA in each mixture. In one embodiment of the present description, this may involve making a measurement of the mixed sample to determine the fraction of fetal DNA in the mixture; this fetal fraction estimate can be done without sequencing, it can be done with TaqMan, it can be done with qPCR, it can be done with SNP matrices, it can be done with any method that can distinguish different alleles at a given locus. The need for a fetal fraction estimate can be eliminated by including assumptions that cover all or a selected set of fetal fractions in the hypothesis set that are considered when comparing to actual measured data. After the fraction of fetal DNA in the mixture has been determined, the number of sequences to be read for each sample can be determined.
[000447] In one embodiment of the present description, 100 pregnant women visit their respective OBs, and their blood is drawn into blood tubes with an anti-oxidant and / or something to inactivate DNAase. They take home a kit for the father of their unborn child to provide a sample of saliva. Both sets of genetic materials for all 100 couples are sent back to the laboratory, where the mother's blood is spun quickly and the leukocyte cream is isolated, as well as the plasma. Plasma comprises a mixture of maternal DNA, as well as placental DNA. The maternal leukocyte cream and the paternal blood are genotyped using an SNP matrix, and the DNA in the maternal plasma samples is targeted with SURE SELECT hybridization probes. The DNA that was taken with the probes is used to generate 100 labeled libraries, one for each of the maternal samples, where each sample is marked with a different marker. A fraction from each library is removed, these fractions are mixed and added to two plans of an ILLUMINA HISEQ DNA sequencer in a multiplexed model, where each plan resulted in approximately 50 million mappable readings, resulting in approximately 100 million readings mappable in 100 multiplexed mixtures, or approximately 1 million readings per sample. Sequence readings were used to determine the fraction of fetal DNA in each mixture. 50 of the samples had more than 15% fetal DNA in the mixture, and 1 million readings were sufficient to determine the ploidy status of the fetuses with 99.9% confidence.
[000448] Of the remaining mixtures, 25 had between 10 and 15% fetal DNA; a fraction of each of the relevant libraries prepared from these mixtures was multiplexed and a HISEQ plan was executed generating an additional 2 million readings for each sample. The two sets of sequence data for each sample of the mixture with between 10 and 15% fetal DNA were added together, and the resulting 3 million readings per sample were sufficient to determine the ploidy status of these fetuses with 99.9% confidence.
[000449] Of the remaining mixtures, 13 had between 6 and 10% fetal DNA; a fraction of each of the relevant libraries prepared from these mixtures was multiplexed and a HISEQ plan was executed generating an additional 4 million readings for each sample. The two sets of sequence data for each sample of the mixture with between 6 and 10% fetal DNA were added together, and the resulting 5 million readings per sample were sufficient to determine the ploidy status of these fetuses with 99.9% confidence.
[000450] Of the remaining mixtures, 8 had between 4 and 6% fetal DNA; a fraction of each of the relevant libraries prepared from these mixtures was multiplexed and a HISEQ plan was executed generating an additional 6 million readings for each sample. The two sets of sequence data for each sample of the mixture with between 4 and 6% fetal DNA were added together, and the resulting 7 million readings per sample were sufficient to determine the ploidy status of these fetuses with 99.9% confidence.
[000451] Of the remaining mixtures, all had between 2 and 4% fetal DNA; a fraction of each of the relevant libraries prepared from these mixtures was multiplexed and a HISEQ plan was executed generating an additional 12 million readings for each sample. The two sets of sequence data for each sample of the mixture with between 2 and 4% fetal DNA were added together, and the resulting 13 million readings per sample were sufficient to determine the ploidy status of these fetuses with 99.9% confidence.
[000452] This method required six sequencing plans on a HISEQ machine to achieve 99.9% accuracy over 100 samples. If the same number of tests was required for each sample, to ensure that each ploidy determination was made with 99.9% accuracy, it would take 25 sequencing plans, and if a non-determination rate or error rate of 4% were tolerated, it could have achieved with 14 sequencing plans.
Using raw genotyping data [000453] There are several methods that can perform NPD using fetal genetic information measured in fetal DNA found in maternal blood. Some of these methods involve taking measurements of fetal DNA using SNP arrays, some methods involve undirected sequencing, and some methods involve directed sequencing. Targeted sequencing can target SNPs, target STRs, target other polymorphic loci, target non-polymorphic loci, or some combination thereof. Some of these methods may involve using a commercial or patented allele determinator that determines the identity of the alleles from the intensity data that comes from the sensors on the measuring machine. For example, the ILLUMINA INFINIUM system or the AFFYMETRIX GENECHIP micro array system involves spheres or microchips with coupled DNA sequences that can hybridize to complementary DNA segments; upon hybridization, there is a change in the fluorescence properties of the sensor molecule that can be detected. There are also sequencing methods, for example, ILLUMINA SOLEXA GENOME SEQUENCER or the ABI SOLID GENOME SEQUENCER, where the genetic sequences of DNA fragments are sequenced; by extending the DNA strand complementary to the strand being sequenced, the identity of the extended nucleotide is typically detected via a radiolabel or fluorescent marker attached to the complementary nucleotide. In all of these methods, genotype or sequencing data is typically determined based on the fluorescent or other signals, or the absence of them. These systems are typically combined with low-level software packages that make specific allele determinations (secondary genetic data) from the analog output of the fluorescent device or other detection device (primary genetic data). For example, in the case of a given allele in an SNP matrix, the software will make a determination, for example, that a certain SNP is present or not if the fluorescence intensity is measured above or below a certain limit. Similarly, the output of a sequencer is a chromatogram that indicates the level of fluorescence detected for each of the dyes, and the software will make a determination that a certain base pair is A or T or C or G. High-throughput sequencers they typically make a series of such measurements, called a reading, that represents the most likely structure of the DNA sequence that has been sequenced. The chromatogram's direct analog output is defined here as the primary genetic data, and the base pair / SNP determinations made by the software are considered here to be the secondary genetic data. In one embodiment, the primary data refers to the raw intensity data that is the unprocessed output of a genotyping platform, where the genotyping platform can refer to a matrix if SNP, or to a sequencing platform. Secondary genetic data refers to processed genetic data, where an allele determination was made, or sequence data was assigned base pairs, and / or sequence readings were mapped to the genome.
[000454] Many higher level applications take advantage of these allele determinations, SNP determinations and sequence readings, that is, the secondary genetic data, that the genotyping software produces. For example, DNA NEXUS, ELAND or MAQ will take the sequencing readings and map them to the genome. For example, in the context of non-invasive prenatal diagnosis, complex information technology, such as PARENTAL SUPPORT®, can leverage a large number of SNP determinations to determine an individual's genotype. Also, in the context of pre-implantation genetic diagnosis, it is possible to obtain a set of sequence readings that are mapped to the genome, and to obtain a normalized count of the readings that are mapped to each chromosome, or section of a chromosome, may be possible determine an individual's ploidy status. In the context of non-invasive prenatal diagnosis, it is possible to obtain a set of sequence readings that can be measured in the DNA present in maternal plasma, and map them to the genome. You can then obtain a normalized count of the readings that are mapped to each chromosome, or section of a chromosome, and use that data to determine an individual's ploidy status. For example, it may be possible to conclude that those chromosomes that have disproportionately large numbers of readings are trisomic in the unborn child in the mother from whom the blood was taken.
[000455] However, in reality, the initial output of the measuring instruments is an analog signal. When a certain base pair is determined by the software that is associated with the sequencing software, for example, the software can determine the base pair a T, in reality, the determination is the determination that the software believes to be most likely. In some cases, however, the determination may be of low confidence, for example, the analog signal may indicate that the particular base pair is only 90% likely to be a T, and 10% likely to be an A. In another example , the genotype determination software that is associated with an SNP matrix reader can determine a certain allele as being G. However, in reality, the underlying analog signal can indicate that it is only 70% likely that the allele is G, and 30% likely that the allele is T. In these cases, when higher-level applications use the genotype determinations and sequence determinations made by the lower-level software, they are missing some information. That is, the primary genetic data, measured directly by the genotyping platform, can be more messy than the secondary genetic data that is determined by the attached software packages, but contains more information. In the mapping of secondary genetic data sequences, many readings are discarded because some bases are not read with sufficient clarity and / or the mapping is not clear. When primary genetic data sequence readings are used, all or many of those readings that were discarded when first converted to secondary genetic data sequence readings can be used by treating the readings in a probabilistic manner.
[000456] In one embodiment of the present description, the highest level software does not have allele determinations, SNP determinations, or sequence readings that are determined by the lowest level software. Instead, the higher-level software bases its calculations on analog signals directly measured from the genotyping platform. In one embodiment of the present description, a computer-based method such as PARENTAL SUPPORT®, is modified so that its ability to reconstruct the genetic data of the embryo / fetus / child is designed to directly use the primary genetic data measured by the genotyping platform . In one embodiment of the present description, a computer-based method such as PARENTAL SUPPORT® is capable of making allele determinations, and / or chromosome copy number determinations using primary genetic data, and not using secondary genetic data. In one embodiment of the present description, all genetic determinations, SNP determinations, sequence readings, sequence mapping are treated in a probabilistic manner using the raw intensity data measured directly by the genotyping platform, rather than converting the primary genetic data in secondary genetic determinations. In one embodiment, the DNA measurements of the prepared sample used in calculating allele count probabilities and determining the relative probability of each hypothesis comprise primary genetic data.
[000457] In some embodiments, the method may increase the accuracy of genetic data from a target individual that incorporates genetic data from at least one related individual, the method comprises obtaining primary genetic data specific to a genome of the target individual and genetic data specific to the genome (s) of the related individual (s), create a set of one or more hypotheses possibly considering which segments of which chromosomes of the related individual (s) correspond to those segments in the genome of the target individual, determine the probability of each of the hypotheses provided the primary genetic data of the target individual and the genetic data of the related individual, and use the probabilities associated with each hypothesis to determine the most likely state of the actual genetic material of the target individual. In some embodiments, the method can determine the number of copies of a chromosome segment in the genome of a target individual, the method comprises creating a set of copy number hypotheses about how many copies of the chromosome segment are present in the genome of a target individual, incorporate primary genetic data from the target individual and genetic information from one or more related individuals in a data set, estimate the characteristics of the platform response associated with the data set, where the platform response may vary from one experiment to another, compute the conditional probabilities for each copy number hypothesis, given the data set and response characteristics of the platform, and determine the number of copies of the chromosome segment based on the most likely copy number hypothesis . In one embodiment, a method of the present description can determine a ploidy state of at least one chromosome in a target individual, the method comprises obtaining primary genetic data from a target individual and one or more related individuals, creating a set of at least a ploidy state hypothesis for each of the target individual's chromosomes, use one or more specialized techniques to determine a statistical probability for each ploidy state hypothesis in the set, for the specialized technique used, provided the obtained genetic data, combine, to each ploidy state hypothesis, statistical probabilities as determined by one or more specialized techniques, and determine the ploidy state for each of the chromosomes in the target individual based on the combined statistical probabilities of each ploidy state hypothesis. In one embodiment, a method of the present description can determine an allelic state in a set of alleles, in a target individual, and from one or both parents of the target individual, and optionally from one or more related individuals, the method comprises obtaining primary genetic data from the target individual, and from one or both parents, and from any related individual, creating a set of at least one allelic hypothesis for the target individual, and for one or both the parents, and optionally for one or more related individuals, where the hypotheses describe possible allele states in the allele set, determine the statistical probability for each allele hypothesis in the set of hypotheses provided the obtained genetic data, and determine the allele state for each alleles in the allele set for the target individual, and for one or both parents, and optionally for one or more individuals related, based on the statistical probabilities of each of the allele hypotheses.
[000458] In some modalities, the genetic data of the mixed sample may comprise sequence data where they may not map exclusively to the human genome. In some embodiments, the genetic data from the mixed sample may comprise sequence data that map to a plurality of locations in the genome, where each possible mapping is associated with a probability that the given mapping is correct. In some embodiments, sequence readings are not assumed to be associated with a particular position in the genome. In some embodiments, sequence readings are associated with a plurality of positions in the genome, and an associated probability pertaining to that position.
Combining prenatal diagnosis methods [000459] There are many methods that can be used for prenatal diagnosis or prenatal screening for aneuploidy or other genetic defects. It is described here in this document, and in the US Utility Order. No. 11 / 603,406, filed on November 28, 2006; US utility application. No. 12 / 076,348, filed on March 17, 2008, and PCT Utility Application No. PCT / S09 / 52730 such a method that uses the genetic data of related individuals to increase the accuracy with which the genetic data of a target individual , like a fetus, are known, or estimated. Other methods used for prenatal diagnosis involve measuring the levels of certain hormones in the maternal blood, where these hormones are correlated with various genetic abnormalities. An example of this is called a triple test, a test where the levels of several different hormones (usually two, three, four or five) are measured in maternal blood. In a case where multiple methods are used to determine the probability of a given result, where none of the methods is definitive and among those, it is possible to combine the information given by those methods to make a prediction that is more accurate than any of the methods individual. In the triple test, combining the information given by the three different hormones can result in a prediction of genetic abnormalities that is more accurate than the individual hormone levels can predict.
[000460] Here is described a method for making more accurate predictions about the genetic status of a fetus, specifically the possibility of genetic abnormalities in a fetus, which comprises combining predictions of genetic abnormalities in a fetus where these predictions were made using various methods. A "more accurate" method may refer to a method for diagnosing an anomaly that has a lower false negative rate at a given false positive rate. In a favored embodiment of the present description, one or more of the predictions are made based on known genetic data about the fetus, where genetic knowledge was determined using the PARENTAL SUPPORTTM method, that is, using the genetic data of the individual related to the fetus to determine genetic data of the fetus with greater precision. In some embodiments, genetic data may include fetal ploidy states. In some modalities, genetic data may refer to a set of allele determinations in the fetus' genome. In some modalities, some of the predictions may have been made using the triple test. In some embodiments, some of the predictions may have been made using measurements of other hormone levels in the maternal blood. In some modalities, predictions made by methods considered diagnostic can be combined with predictions made by methods considered screening. In some modalities, the method involves measuring levels in the maternal blood of alpha-fetoprotein (AFP). In some modalities, the method involves measuring levels in the maternal blood of unconjugated estriol (UE3). In some modalities, the method involves measuring maternal blood levels of human chorionic gonadotropin (beta-hCG). In some modalities, the method involves measuring levels in the maternal blood of invasive trophoblast antigen (ITA). In some modalities, the method involves measuring levels in the maternal blood of inhibin. In some modalities, the method involves measuring maternal blood levels of plasma protein A associated with pregnancy (PAPP-A). In some modalities, the method involves measuring levels in the maternal blood of other maternal hormones or serum markers. In some embodiments, some of the predictions may have been made by other methods. In some modalities, some of the predictions may have been made using a fully integrated test, such as one that combines ultrasound and blood testing at approximately 12 weeks of gestation and a second blood test at approximately 16 weeks. In some modalities, the method involves measuring the fetal nuchal translucency (NT). In some modalities, the method involves using the measured levels of the hormones mentioned earlier to make predictions. In some embodiments, the method involves a combination of the methods mentioned above.
[000461] There are several ways to combine predictions, for example, one could convert the measurements of hormones into a multiple of the median (MoM) and then in the probability relationships (LR). Likewise, other measurements could be transformed into LRs using the NT distribution mix model. LRs for NT and biochemical markers could be multiplied by age and the risk related to pregnancy to derive the risk of various conditions, such as trisomy 21. Detection rates (DRs) and false positive rates (FPRs) can be calculated using proportions with risks above a given risk threshold.
[000462] In one embodiment, a method for determining ploidy status involves combining the relative probabilities of each of the ploidy hypotheses determined using the joint distribution model and the allele counting probabilities with the relative probabilities of each of the hypotheses ploidy that are calculated using statistical methods extracted from other methods that determine a degree of risk for a fetus to be trisomic, including, but not limited to: an analysis of the reading count, comparing heterozygosity rates, a statistic that is only available when parental genetic information is used, the probability of signs of genotypes normalized for certain parental contexts, a statistic that is calculated using an estimated fetal fraction of the first sample or the prepared sample, and combinations thereof.
[000463] Another method could involve a situation with four measured hormone levels, where the probability distribution around these hormones is known: p (x1, x2, x3, x4 | e) for the euploid case ep (x1, x2, x3 , x4 | a) for the aneuploid case. Then, one could measure the probability distribution for the measurements of DNA, g (y | e) and g (y | a) for euploid and aneuploid cases, respectively. Supposing that they are independent, given the euploid / aneuploid assumption, one could combine as p (x1, x2, x3, x4 | a) g (y | a) and p (x1, x2, x3, x4 | e) g (y | e), and then each multiply by p (a) and p (e) above, given the maternal age. You could then choose what is best.
[000464] In one embodiment, it is possible to evoke the central limit theorem to assume that the distribution in g (y | a or e) is Gaussian, and measure the mean and standard deviation by observing multiple samples. In another modality, it can be assumed that they are not independent given the result and collect enough samples to estimate the joint distribution p (x1, x2, x3, x4 | a or e).
[000465] In one embodiment, the ploidy state for the target individual is determined to be the ploidy state that is associated with the hypothesis whose probability is greater. In some cases, a hypothesis will have a combined normalized probability greater than 90%. Each hypothesis is associated with one, or a set of ploidy states, and the ploidy state associated with the hypothesis whose normalized combined probability is greater than 90%, or some other limit value, such as 50%, 80%, 95 %, 98%, 99%, or 99.9%, can be chosen as the limit required for the hypothesis to be determined as the ploidy state determined.
[000466] DNA from children of previous pregnancies in maternal blood [000467] One difficulty for non-invasive prenatal diagnosis is to differentiate fetal cells from current pregnancy from fetal cells from previous pregnancies. Some believe that the genetic material from previous pregnancies will go away after some time, but conclusive evidence has not been shown. In one embodiment of the present description, it is possible to determine the fetal DNA present in maternal blood of paternal origin (that is, the DNA that the fetus inherited from the father), using the PARENTAL SUPPORT® (PS) method, and knowledge of the paternal genome . This method can use the phased parental genetic information. It is possible to phase the parental genotype from non-phased genotypic information using the grandparents' genetic data (such as the genetic data measured from the grandfather's sperm), or genetic data from other born children, or a sample of a miscarriage. Unphased genetic information could also be phased through HapMap-based phasing, or a haplotyping of paternal cells. Successful haplotyping has been demonstrated by capturing cells in the mitosis phase when chromosomes are tight bundles and using microfluidics to place separate chromosomes in separate wells. In another modality, it is possible to use the phased parental haplotypic data to detect the presence of more than one homologue from the father, which implies that the genetic material from more than one child is present in the blood. By focusing on chromosomes that are expected to be euploid in a fetus, one can rule out the possibility that the fetus will be afflicted with a trisomy. In addition, it is possible to determine whether the fetal DNA is not from the current father, in which case other methods can be used, such as the triple test to predict genetic abnormalities.
[000468] There may be other sources of fetal genetic material available through methods other than blood withdrawal. In the case of fetal genetic material available in maternal blood, there are two main categories: (1) whole fetal cells, for example, nucleated fetal red blood cells or erythroblasts, and (2) free fetal DNA. In the case of whole fetal cells, there is some evidence that fetal cells may persist in maternal blood for an extended period of time such that it is possible to isolate a cell in a pregnant woman that contains the DNA of a child or fetus in a previous pregnancy. There is also evidence that free fetal DNA is eliminated from the system in a matter of weeks. A challenge is how to determine the identity of the individual whose genetic material is contained in the cell, that is, to ensure that the measured genetic material is not from a fetus from a previous pregnancy. In one embodiment of the present description, knowledge of maternal genetic material can be used to ensure that the genetic material in question is not maternal genetic material. There are a number of methods to achieve this, including computer-based methods, such as PARENTAL SUPPORT®, as described in this document or in any of the patents cited in this document.
[000469] In one embodiment of the present description, the blood drawn from the pregnant woman can be separated into a fraction comprising free fetal DNA and a fraction comprising nucleated red blood cells. Free DNA can optionally be enriched, and genotypic DNA information can be measured. From the genotypic information measured from the free DNA, knowledge of the maternal genotype can be used to determine aspects of the fetal genotype. These aspects may refer to the ploidy state, and / or to a set of allele identities. Then, the individual nucleated red blood cells can be genotyped using methods described in this document, and in other related patents, especially those mentioned in the first section of this document. Knowledge of the maternal genome would make it possible to determine whether or not any given single blood cell is genetically maternal. And the aspects of the fetal genotype that were determined as described above would make it possible to determine whether the single blood cell is genetically derived from the fetus that is currently pregnant. In essence, this aspect of the present description makes it possible to use the mother's genetic knowledge, and possibly, the genetic information of other related individuals, such as the father, together with the genetic information measured from the free DNA found in the maternal blood to determine whether an isolated nucleated cell found in maternal blood is either (a) genetically maternal, (b) genetically from a fetus currently pregnant, or (c) genetically from a fetus from a previous pregnancy.
Determination of prenatal sex chromosome aneuploidy [000470] In methods known in the art, people attempting to determine the sex of a gestating fetus from the mother's blood have used the fact that free fetal DNA (fffDNA) is present in the mother's plasma. If Y-specific loci can be detected in maternal plasma, this implies that the unborn child is male. However, the lack of detection of Y-specific loci in plasma does not always guarantee that the fetus in gestation is female when using methods known in the prior art, as, in some cases, the amount of fffDNA is too low to ensure that Y-specific loci would be detected in the case of a male fetus. [000471] Here is presented a new method that does not require the measurement of Y-specific nucleic acids, that is, the DNA that is of loci that are exclusively paternally derived. The PARENTAL SUPPORT® method, previously described, uses crossover frequency data, parental genotypic data, and computer techniques to determine the ploidy status of a gestating fetus. The sex of a fetus is simply the ploidy state of the fetus on the sex chromosomes. A child who is XX is female and XY is male. The method described here is also capable of determining the ploidy status of the fetus. Note that sexing is effectively synonymous with ploidy determination of sex chromosomes; in the case of sexing, an assumption is often made that the child is euploid, so there are fewer possible hypotheses.
[000472] The method described here involves considering the loci that are common to both X and Y chromosomes to create a baseline in terms of the expected amount of fetal DNA present for a fetus. Then, those regions that are specific to the X chromosome only can be interrogated to determine whether the fetus is female or male. In the case of a male fetus, it is expected to see less fetal DNA from loci that are specific to the X chromosome than from loci that are specific to both X and Y. In contrast, in fetuses of the sex female, the amount of DNA for each of these groups is expected to be the same. The DNA in question can be measured by any technique that can quantify the amount of DNA present in a sample, for example, qPCR, SNP matrices, genotyping matrices, or sequencing. For DNA that is exclusively from an individual, we expect to see the following: [000473] In the case of DNA from a fetus that is mixed with the mother's DNA, and where the fraction of fetal DNA in the mixture is F, and where the fraction of maternal DNA in the mix is M, such that F + M = 100%, we expect to see the following: [000474] In the case where F and M are known, the expected ratios can be calculated, and the observed data can compared with expected data. In the case where M and F are not known, a limit can be selected based on historical data. In both cases, the measured amount of DNA at specific X and Y loci can be used as a baseline, and the test for fetal sex can be based on the amount of DNA observed at specific loci for the X chromosome only. . If that amount is less than the baseline by an amount approximately equal to ½ F, or by an amount that causes it to fall below a predefined limit, the fetus is determined to be male, and if that amount is approximately equal to the baseline, or, if not less by an amount that causes it to fall below a predefined limit, the fetus is determined to be female.
[000475] In another embodiment, one can only consider the loci that are common to both X and Y chromosomes, often called the Z chromosome. A subset of the loci on the Z chromosome are typically always A on the X chromosome, and B on the chromosome Y. If the SNPs on the Z chromosome reveal to have the B genotype, then the fetus is called a male, if the SNPs from the Z chromosome reveal to have only genotype A, then the fetus is called a female. In another embodiment, loci that are found only on the X chromosome can be considered. Contexts, such as AA | B, are particularly informative as the presence of a B indicates that the fetus has an X chromosome from the father. Contexts such as AB | B are also informative, as B is expected to see only half present as often in the case of a female fetus, compared to a male fetus. In another modality, the SNPs on the Z chromosome can be considered, where both the A and B alleles are present on both the X and Y chromosomes, and where the SNPs are known to be on the paternal Y chromosome and on the paternal X chromosome. .
[000476] In one embodiment, it is possible to amplify single nucleotide positions known to vary between the homologous non-recombinant region (HNR) shared by the Y chromosome and the X chromosome. The sequence within that HNR region is identical between the X and Y chromosomes. Within this identical region are single nucleotide positions that, although invariant between X chromosomes and Y chromosomes in the population, are different between X and Y chromosomes. Each PCR assay could amplify a sequence from the loci that are present on both X and Y chromosomes. Within each amplified sequence would be a single base that can be detected using sequencing or some other method.
[000477] In one embodiment, the sex of the fetus can be determined from the free fetal DNA found in maternal plasma, the method comprising some or all of the following steps: 1) Designing PCR (either regular or mini-PCR primers, plus multiplexing, if desired) amplify the positions of single X / Y variant nucleotides within the HNR region, 2) obtain maternal plasma, 3) PCR amplify maternal plasma targets using X / Y HNR PCR assays, 4) sequence the amplicons, 5) examine the sequence data for the presence of the Y allele within one or more of the amplified sequences. The presence of one or more would indicate a male fetus. The absence of all Y alleles of all amplicons indicates a female fetus.
[000478] In one embodiment, targeted sequencing could be used to measure DNA in maternal plasma and / or parental genotypes. In one embodiment, one could ignore all sequences that clearly originate from DNA of paternal origin. For example, in the AA | AB context, one could count the number of A sequences and ignore all B sequences. In order to determine a heterozygosity rate for the above algorithm, one could compare the number of sequences A observed with the expected number of total sequences for the given probe. There are many ways that one could calculate the expected number of strings for each probe on a per sample basis. In one embodiment, it is possible to use historical data to determine which fraction of all sequence readings belong to each specific probe and then use this empirical fraction, combined with the total number of sequence readings, to estimate the number of sequences in each probe. Another approach could be to target some known homozygous alleles and then use historical data to relate the number of readings on each probe to the number of readings on the known homozygous alleles. For each sample, one could then measure the number of readings on the homozygous alleles and then use that measurement, along with the empirically derived relationships, to estimate the number of sequence readings on each probe.
[000479] In some modalities, it is possible to determine the sex of the fetus by combining the predictions made by a plurality of methods. In some embodiments, a plurality of methods are extracted from the methods described in the present description. In some embodiments, at least one of the plurality of methods is extracted from methods described in the present description.
[000480] In some embodiments, the method described here can be used to determine the ploidy status of the unborn fetus. In one embodiment, the ploidy determination method uses loci that are specific to the X chromosome, or common to both X and Y chromosomes, but does not make use of any of the Y-specific loci. In one embodiment, the ploidy determination method uses one or more of the following: loci that are specific to the X chromosome, loci that are common to both X and Y chromosomes, and loci that are specific to the Y chromosome. modality, when sex chromosome relationships are similar, for example, 45, X (Turner syndrome), 46, XX (normal female) and 47, XXX (trisomy X), differentiation can be performed by comparing allelic distributions with the expected allelic distributions according to the various hypotheses. In another embodiment, this can be done by comparing the relative number of sequence readings for sex chromosomes to one or a plurality of reference chromosomes that are assumed to be euploids. It is also noted that these methods can be expanded to include aneuploid cases.
Single gene disease screening [000481] In one embodiment, a method for determining the fetus' ploidy status can be extended to allow simultaneous testing for disorders of a single gene. Diagnosis of single gene disease uses the same targeted approach used for aneuploidy tests, and requires additional specific targets. In one embodiment, single-gene NPD diagnosis is through linkage analysis. In many cases, direct testing of the cfDNA sample is unreliable, as the presence of maternal DNA makes it virtually impossible to determine whether the fetus has inherited the mutation from the mother. The detection of a single allele, derived from the father is less demanding, but it is totally informative if the disease is dominant and carried by the father, limiting the usefulness of the approach. In one embodiment, the method involves PCR or related amplification approaches.
[000482] In some embodiments, the method involves phasing the abnormal allele with very tightly bound SNPs surrounding the parents using information from first-degree relatives. Then PARENTAL SUPPORTTM can be run on the targeted sequencing data obtained from these SNPs to determine which homologues, normal or abnormal, were inherited by the fetus from both parents. As long as the SNPs are sufficiently linked, the inheritance of the fetus genotype can be determined very reliably. In some embodiments, the method comprises: (a) adding a set of SNP loci to densely flank a specific set of common diseases for the multiplex group to test for aneuploidy; (b) reliably phase alleles from those added SNPs with normal and abnormal alleles based on genetic data from various relatives; and (c) reconstruct the fetal diplotype, or set of SNP alleles staged in the maternal and paternal homologues inherited in the region around the disease locus to determine the fetal genotype. In some embodiments, additional probes that are closely linked to a locus linked to the disease are added to the set of polymorphic locus being used for aneuploidy tests.
[000483] Reconstructing the fetal diplotype is challenging because the sample is a mixture of maternal and fetal DNA. In some modalities, the method incorporates relative information to phase the SNPs and alleles of the disease, then takes into account the physical distance of the SNPs and the recombination data from location-specific recombination probabilities and the data observed from genetic measurements maternal plasma to obtain the most likely genotype of the fetus.
[000484] In one embodiment, a series of additional probes per locus linked to the disease is included in the set of target polymorphic loci; the number of additional probes per locus linked to the disease can be between 4 and 10, between 11 and 20, between 21 and 40, between 41 and 60, between 61 and 80, or combinations thereof.
Determining the number of DNA molecules in a sample [000485] A method is described here to determine the number of DNA molecules in a sample by generating a molecule uniquely identified for each original DNA molecule in the sample during the first step of DNA amplification. A procedure for carrying out the above is described here, followed by a clonal or single molecule sequencing method.
[000486] The approach involves targeting one or more specific loci and generating a labeled copy of the original molecules in such a way that most or all of the labeled molecules from each target locus have a unique marker and can be distinguished from the others by sequencing that code. bars using clonal or single molecule sequencing. Each single sequenced bar code represents a single molecule in the original sample. Simultaneously, the sequencing data is used to determine the locus from which the molecule originates. Using this information, it is possible to determine the number of unique molecules in the original sample for each locus.
[000487] This method can be used for any application where quantitative assessment of the number of molecules in an original sample is required. In addition, the number of single molecules of one or more targets can be related to the number of single molecules to one or more other targets to determine the relative number of copies, the allelic distribution, or the ratio of alleles. Alternatively, the number of copies detected from multiple targets can be modeled by a distribution, in order to identify the most likely number of copies of the original targets. Applications include, but are not limited to, the detection of insertions and deletions such as those found in patients with Duchenne Muscular Dystrophy; the quantification of deletions or duplications of chromosome segments, such as those observed in variants of the number of copies; number of chromosome copies of samples from individuals born; number of chromosome copies of samples from unborn individuals such as embryos or fetuses.
[000488] The method can be combined with the simultaneous evaluation of variants contained in the target by the sequence. This can be used to determine the number of molecules that represent each allele in the original sample. This copy number method can be combined with the evaluation of SNPs or other sequence variations to determine the number of copies of the chromosome of individuals born and unborn, the discrimination and the quantification of copies from the loci that have variations in short sequence, but in which the PCR can amplify from multiple target regions, such as in the detection of patients with spinal muscular atrophy; determining the number of copies of different sources of molecules from samples consisting of mixtures from different individuals, such as in the detection of fetal aneuploidy from free DNA obtained from maternal plasma.
[000489] In one embodiment, the method that refers to a single target locus can comprise one or more of the following steps: (1) Create a standard pair of oligomers for PCR amplification of a specific locus. (2) Add, during synthesis, a sequence of specified bases with minimal or no complementarity with the target locus or genome for the 5 'end of a specific target oligomer. That sequence, called the tail, is a known sequence, to be used for subsequent amplification, followed by a sequence of random nucleotides. These random nucleotides comprise the random region. The random region comprises a randomly generated sequence of nucleic acids that probabilistically differ between each probe molecule. Consequently, after synthesis, the group of tailed oligomers will consist of a collection of oligomers starting with a known sequence followed by an unknown sequence that differs between the molecules, followed by the specific target sequence. (3) Perform an amplification step (denaturation, pairing, extension) using only the tailed oligomer. (4) Add exonuclease to the reaction, effectively stopping the PCR reaction, and incubating the reaction at the appropriate temperature to remove direct single-stranded oligos that do not pair with a temple and extend to form a double-stranded product. (5) Incubate the reaction at an elevated temperature to denature the exonuclease and eliminate its activity. (6) Add to the reaction a new oligonucleotide that is complementary to the tail of the oligomer used in the first reaction, together with the other specific target oligomer, to allow PCR amplification of the product generated in the first PCR step. (7) Continue amplification to generate enough product for downstream clonal sequencing. (8) Measure the amplified PCR product by various methods, for example, clonal sequencing, to a sufficient number of bases to extend the sequence.
[000490] In one embodiment, a method of the present description involves targeting multiple loci in parallel or otherwise. Primers for different target loci can be generated independently and mixed to create multiplex PCR groups. In one embodiment, the original samples can be divided into different subgroups and loci can be targeted in each subgroup before being recombined and sequenced. In one embodiment, the tagging step and a number of amplification cycles can be performed before the group is subdivided to ensure effective targeting of all targets before splitting, and improving subsequent amplification by continuing amplification using smaller sets of primers in subdivided groups.
[000491] An example of an application where this technology would be particularly useful is the non-invasive prenatal diagnosis of aneuploidy where the allele ratio at a given locus or a distribution of alleles at a number of loci can be used to help determine the number of copies of a chromosome present in a fetus. In this context, it is desirable to amplify the DNA present in the initial sample, while maintaining the relative amounts of the various alleles. In some modalities, especially in cases where there is a very small amount of DNA, for example, less than 5,000 copies of the genome, less than 1,000 copies of the genome, less than 500 copies of the genome, and less than 100 copies of the genome, you can find a phenomenon called a bottleneck. That is, when there is a small number of copies of any given allele in the initial sample, and amplification biases can result in the amplified group of DNA having significantly different ratios from those alleles that are in the initial DNA mixture. By applying a single or almost unique set of bar codes to each DNA strand prior to standard PCR amplification, it is possible to exclude n-1 copies of DNA from a set of n identical sequenced DNA molecules that originated of the same original molecule.
[000492] For example, a heterozygous SNP is imagined in an individual's genome, and a DNA mixture of the individual where ten molecules from each allele are present in the original DNA sample. After amplification, there may be 100,000 DNA molecules corresponding to that locus. Due to stochastic processes, the DNA ratio could be anywhere from 1: 2 to 2: 1, however, since each of the original molecules was marked with a unique marker, it would be possible to determine that the DNA in the amplified group originated of exactly 10 DNA molecules from each allele. This method would then provide a more accurate measure of the relative quantities of each allele than a method that does not use this approach. For methods where it is desirable that the relative amount of allele bias is minimized, this method will provide more accurate data.
[000493] The association of the sequenced fragment to the target locus can be achieved in several ways. In one embodiment, a sequence of sufficient length is obtained from the target fragment to span the molecule barcode, as well as a sufficient number of unique bases corresponding to the target sequence to allow unambiguous identification of the target locus. In another embodiment, the molecular bar code initiator that contains the randomly generated molecular bar code may also contain a locus-specific bar code (locus bar code) that identifies the target to which it is associated. This locus barcode would be identical among all molecular barcode primers for each individual target and, therefore, all resulting amplicons, but different from all other targets. In one embodiment, the tagging method described here can be combined with a unilateral nesting protocol.
[000494] In one embodiment, the creation and generation of molecular barcode primers can be reduced to practice as follows: molecular barcode primers can consist of a sequence that is not complementary to the target sequence, followed by the region random molecular barcode followed by a specific target sequence. The 5 'molecular barcode sequence can be used for subsequent PCR amplification and can comprise sequences useful in converting the amplicon into a library for sequencing. The random molecular barcode sequence could be generated in several ways. The preferred method synthesizes the molecule labeling primer in such a way as to include all four bases for the reaction during the synthesis of the barcode region. All or several combinations of bases can be specified using the IUPAC DNA ambiguity codes. In this way, the synthesized collection of molecules will contain a random mix of sequences from the molecular barcode region. The length of the barcode region will determine how many initiators will contain unique barcodes. The number of unique strings is related to the length of the barcode region such as NL, where N is the number of bases, typically 4, and L is the length of the barcode. A five base barcode can produce up to 1,024 unique strings; an eight base bar code can produce 65,536 unique bar codes. In one embodiment, DNA can be measured by a sequencing method, where the sequence data represents the sequence of a single molecule. This may include methods in which single molecules are directly sequenced or methods in which single molecules are amplified to form clones detectable by the sequencing instrument, but which still represent single molecules, here called clonal sequencing.
Some Modalities [000495] In some modalities, a method for generating a report describing the ploidy state of a chromosome in a gestating fetus is described here, the method comprises: obtaining a first sample containing DNA from the mother of the fetus and DNA fetus; obtain genotypic data from one or both parents of the fetus; prepare the first sample, isolating the DNA in order to obtain a prepared sample; measuring the DNA in the sample prepared at a plurality of polymorphic loci; calculate, on a computer, the allele counts or allele count probabilities in the plurality of polymorphic loci from the DNA measurements made in the prepared sample; create, in a computer, a plurality of ploidy hypotheses considering the expected allele counting probabilities in the plurality of polymorphic loci on the chromosome for different possible ploidy states of the chromosome; construct, on a computer, a joint distribution model for the probability of counting alleles of each polymorphic locus on the chromosome for each ploidy hypothesis using genotypic data from one or both parents of the fetus; determine, on a computer, a relative probability of each ploidy hypothesis using the joint distribution model and the allele count probabilities calculated for the prepared sample; determine the ploidy state of the fetus by selecting the ploidy state corresponding to the hypothesis with the greatest probability; and generate a report describing the determined ploidy status.
[000496] In some embodiments, the method is used to determine the ploidy status of a plurality of unborn fetuses in a plurality of respective mothers, the method further comprises: determining the percentage of DNA that is of fetal origin in each of the prepared samples; and where the step of measuring the DNA in the prepared sample is done by sequencing a series of DNA molecules in each of the prepared samples, where more DNA molecules are sequenced from those prepared samples that have a smaller fraction of fetal DNA than than prepared samples that have a larger fraction of fetal DNA.
[000497] In some modalities, the method is used to determine the ploidy status of a plurality of unborn fetuses in a plurality of respective mothers, and where the measurement of DNA in the prepared sample is made, for each of the fetuses, by sequencing of a first fraction of the DNA sample prepared to provide a first set of measurements, the method further comprises: making a first determination of relative probability for each of the ploidy hypotheses for each of the fetuses, given the first set of measurements of DNA; to sequence a second fraction of the sample prepared from these fetuses where the first determination of relative probability for each of the ploidy hypotheses indicates that a ploidy hypothesis corresponding to an aneuploid fetus has a significant, but not conclusive, probability of obtaining a second set of measurements; making a second determination of relative probability for ploidy hypotheses for fetuses using the second set of measurements and, optionally, also the first set of measurements; and to determine the ploidy states of fetuses whose second sample was resequenced by selecting the ploidy state corresponding to the hypothesis with the highest probability, as determined by the second determination of relative probability. [000498] In some embodiments, a composition of matter is described, the composition of matter comprising: a DNA sample preferably enriched, where the DNA sample preferably enriched was preferably enriched in various polymorphic loci from a first DNA sample, where the first DNA sample consisted of a mixture of maternal DNA and fetal DNA derived from maternal plasma, where the degree of enrichment is at least a factor of 2, and where the allelic bias between the first sample and the preferably enriched sample is, on average, selected from the group consisting of less than 2%, less than 1%, less than 0.5%, less than 0.2%, less than 0.1%, less than 0.05%, less 0.02% and less than 0.01%. In some embodiments, a method for creating a sample of such preferably enriched DNA is described.
[000499] In some embodiments, a method is described to determine the presence or absence of a fetal aneuploidy in a maternal tissue sample comprising fetal and maternal genomic DNA, where the method comprises: (a) obtaining a mixture of fetal genomic DNA and of said maternal tissue sample, (b) selectively enrich the mixture of fetal and maternal DNA in a plurality of polymorphic alleles, (c) distribute selectively enriched fragments from the mixture of fetal and maternal genomic DNA from step (a) to provide reaction samples comprising a single genomic DNA molecule or amplification products from a single genomic DNA molecule, (d) conduct massively parallel DNA sequencing of the selectively enriched fragments of genomic DNA in the reaction samples from step (c) to determine the sequence of said selectively enriched fragments, (e) identify the chromosomes to which the sequences obtained in and slap (d) belong; (f) analyzing the data from step (d) to determine (i) the number of genomic DNA fragments from step (d) that belong to at least one first target chromosome that is presumed to be diploid in both the mother and the fetus, and (ii) the number of genomic DNA fragments from step (d) that belong to a second target chromosome, where said second chromosome is suspected to be aneuploid in the fetus; (g) calculating an expected distribution of the number of genomic DNA fragments from step (d) to the second target chromosome if the second target chromosome is euploid, using the number determined in step (f), part (i); (h) calculate an expected distribution of the number of genomic DNA fragments from step (d) to the second target chromosome, if the second target chromosome is aneuploid, using the number determined from step (f), part (i), and a estimated fraction of fetal DNA found in the mixture of step (b); and (i) use a maximum likelihood or a posteriori approach to determine whether the number of genomic DNA fragments determined in step (f) part (ii) is more likely to be part of the distribution calculated in step (g), or the distribution calculated in step (h); thus indicating the presence or absence of fetal aneuploidy.
Experimental Section [000500] The modalities presently described are described in the following Examples, which are presented to assist in understanding the description, and should not be construed to limit in any way the scope of the description as defined in the following claims. The following examples are presented in order to provide those skilled in the art with a complete description of how to use the described modalities, and are not intended to limit the scope of the description or to represent that the experiments below are all or the only experiments performed.
Efforts were made to ensure accuracy in relation to the numbers used (for example, quantities, temperature, etc.), but some experimental errors and deviations should be considered. Unless otherwise stated, the parts are parts by volume, and the temperature is in degrees centigrade. It should be understood that variations in the described methods can be made without changing the fundamental aspects that the experiments aim to illustrate.
Experiment 1 [000501] The objective was to show that a Bayesian maximum likelihood estimation (MLE) algorithm, which uses the parents' genotypes to calculate the fetal fraction that improves the accuracy of the non-invasive prenatal diagnosis of trisomy compared with the methods published.
[000502] The simulated sequencing data for maternal cfDNA was created by sampling readings obtained in trisomy 21 and respective maternal cell lines. The rates of correct determination of trisomy disomy were determined from 500 simulations in various fetal fractions for a published method (Chiu et al., BMJ 2011; 342: C7401) and the MLE-based algorithm. The simulations are validated by obtaining 5 million readings per shot in the dark ("shotgun") from four pregnant mothers and their parents collected under a protocol approved by IRB. Parental genotypes were obtained in a 290K SNP matrix. (See Figure 14).
[000503] In simulations, the MLE-based approach reached 99.0% accuracy for fetal fractions as low as 9% and reported confidence levels that corresponded well to total accuracy. These results are validated using four real samples where all the correct determinations were obtained with a calculated confidence greater than 99%. In contrast, the implementation of the algorithm published by Chiu and others required 18% of the fetal fraction to achieve 99.0% accuracy, and achieved only 87.8% accuracy on 9% fetal DNA.
[000504] The determination of the fetal fraction from the parental genotypes, together with an MLE-based approach, achieves greater accuracy than the published algorithms in the expected fetal fractions during the 1st and the beginning of the 2nd trimester. In addition, the method described here produces a confidence metric that is crucial for determining the reliability of the result, especially in low fetal fractions where the detection of ploidy is more difficult. Published methods use a less accurate threshold method to determine ploidy based on large sets of disomy training data, an approach that allows predefining a false positive rate. In addition, without a reliable metric, published methods are at risk of reporting false negative results when there is insufficient fetal cfDNA to make a determination. In some modalities, a confidence estimate is calculated for the given ploidy state.
Experiment 2 [000505] The objective was to improve non-invasive detection of fetal trisomy 18, 21 and X particularly in samples composed of low fetal fraction using a targeted sequencing approach combined with parental genotypes and Hapmap data in a Bayesian estimation algorithm by maximum likelihood (MLE). [000506] Maternal samples from four euploids and two trisomy-positive pregnancies and the respective paternal samples were obtained under an IRB-approved protocol from patients where fetal karyotype was known. maternal cfDNA was extracted from the plasma and approximately 10 million sequence readings were obtained after preferential enrichment of these specific target SNPs. The parents' samples were similarly sequenced to obtain the genotypes.
[000507] The described algorithm correctly determined the disomy of chromosome 18 and 21 for all euploid samples and normal chromosomes from aneuploid samples. The determination of trisomy 18 and 21 was correct, as were the copy numbers of the X chromosome in male and female fetuses. The confidence produced by the algorithm was in excess of 98% in all cases. [000508] The method described accurately reported the ploidy of all chromosomes tested from six samples, including samples composed of less than 12% of fetal DNA, which represent about 30% of the samples from the 1st and the beginning of the 2nd ° trimester of gestation. The crucial difference between the present MLE algorithm and the published methods is that it leverages parental genotypes and Hapmap data to improve accuracy and generate a reliable metric. At low fetal fractions, all methods become less accurate; it is important to correctly identify samples without sufficient fetal cfDNA to make a reliable determination. Others have used specific Y chromosome probes to estimate the fetal fraction of male fetuses, but simultaneous parental genotyping allows the estimation of the fetal fraction for both sexes. Another inherent limitation of published methods using shotgun sequencing is that the accuracy of ploidy determination varies between chromosomes due to differences in factors such as GC richness. The targeted sequencing approach is quite independent of such chromosome scale variations and results in more consistent performance between chromosomes. Experiment 3 [000509] The objective was to determine whether trisomy is detectable with a high degree of confidence in a triploid fetus, using new computer techniques to analyze the SNP loci of free fetal DNA in maternal plasma.
[000510] 20 mL of blood was collected from a pregnant patient after an abnormal ultrasound. After centrifugation, maternal DNA was extracted from the leukocyte cream (DNeasy, QIAGEN); cell-free DNA was extracted from plasma (QIAamp QIAGEN). Targeted sequencing was applied to SNP loci on chromosomes 2, 21, and X in both DNA samples. The Bayesian maximum likelihood estimate selected the most likely hypothesis from the set of all possible ploidy states. The method determines the fetal DNA fraction, the ploidy state and the explicit reliance on the ploidy determination. No assumptions are made about the ploidy of a reference chromosome. The diagnosis uses a statistical test that is independent of the sequence reading counts, which is the recent state of the art.
[000511] The present method accurately diagnosed trisomy of chromosomes 2 and 21. The child's fraction was estimated at 11.9% [CI 11.7-12.1]. The fetus was found to have one maternal copy and two paternal copies of chromosomes 2 and 21 with confidence of effectively 1 (error probability <10-30). This was achieved with 92,600 and 258,100 readings on chromosomes 2 and 21, respectively.
[000512] This is the first demonstration of non-invasive prenatal diagnosis of trisomic chromosomes from maternal blood in which the fetus was triploid, as confirmed by a metaphase karyotype. Existing non-invasive diagnostic methods would not detect aneuploidy in this sample. Current methods rely on an excess of sequence readings on a trisomic chromosome in relation to the disomic reference chromosomes; but a triploid fetus has no disomalous reference. Furthermore, existing methods would not achieve a similar high-confidence ploidy determination with this fraction of fetal DNA and number of sequence readings. It is simple to extend the approach to all 24 chromosomes. Experiment 4 [000513] The following protocol was used for amplification of 800-plex of DNA isolated from maternal plasma from a euploid pregnancy and also genomic DNA from a triploidy cell line 21 using standard PCR (meaning that none nesting was used). Library preparation and amplification involve a single blunt tube termination followed by tail A. The adapter connection was performed using the connection kit found in the AGILENT SURESELECT kit, and PCR was performed for 7 cycles. Then, 15 cycles of STA (95 ° C for 30 s; 72 ° C for 1 min, 60 ° C for 4 min, 65 ° C for 1 min; 72 ° C for 30 s) with 800 pairs of different primers aiming SNPs on chromosomes 2, 21 and X. The reaction was carried out with a concentration of 12.5 nM of primer. The DNA was then sequenced with an ILLUMINA IIGAX sequencer. The sequencer produced 1.9 million readings, of which 92% mapped to the genome; of those readings that mapped to the genome, more than 99% mapped to one of the target regions of the target primers. The numbers were essentially the same for both plasma DNA and genomic DNA. Figure 15 shows the relationship between the two alleles for ~ 780 SNPs that were detected by the genomic DNA sequencer that was taken from a cell line with a known trisomy on chromosome 21. Note that the allele relationships are represented here to facilitate visualization, because allelic distributions are not easy to read visually. The circles represent SNPs on disomic chromosomes, while stars represent SNPs on a trisomal chromosome. Figure 16 is another representation of the same data as in Figure X, where the Y axis is the relative number of A and B measured for each SNP, and where the X axis is the number of SNP where the SNPs are separated by chromosome. In Figure 16, SNPs 1 to 312 are found on chromosome 2, SNP 313 to 605 are found on chromosome 21, which is trisomal, and SNP 606 to 800 are on chromosome X. Data on chromosomes 2 and X show a chromosome disomic, as the relative sequence counts fall into three groups: AA at the top of the graph, BB at the bottom of the graph, and AB in the middle of the graph. The data from chromosome 21, which is trisomic, shows four groups: AAA at the top of the graph, AAB around the 0.65 (2/3) line, ABB around the 0.35 (1/3) line, and BBB at the bottom of the graph.
[000514] Figure 17 shows data for the same 800-plex protocol, but measured in DNA that was amplified from four plasma samples from pregnant women. For these four samples, seven groups of points are expected to be seen: 1 along the top of the graph are those loci where the mother and fetus are AA, 2 slightly below the top of the graph are those loci where the mother is AA and the fetus is AB, 3 just above the 0.5 line are the loci where the mother is AB and the fetus is AA, 4 along the 0.5 line are the loci where the mother and the fetus are both AB, 5 just below the 0.5 line are those loci where the mother is AB and the fetus is BB 6, just above the bottom of the graph are those loci where the mother is BB and the fetus is AB, 7 while along the bottom of the graph are those loci where both the mother and the fetus are BB. The smaller the fetal fraction, the smaller the separation between groups 1 and 2, between groups 3, 4 and 5, and between groups 6 and 7. The separation must be half the fraction of DNA that is of fetal origin. For example, if DNA is 20% fetal and 80% maternal, 1 to 7 is expected to be centered at 1.0, 0.9, 0.6, 0.5, 0.4, 0.1 and 0 , 0, respectively; see, for example, Figure 17, POOL1_BC5_ref_rate. If, instead of the fetal DNA being 8% fetal and 92% maternal, 1 to 7 is expected to be centered at 1.00, 0.96, 0.54, 0.50, 0.46, 0.04 and 0.00, respectively, see, for example, Figure 17, POOL1_BC2_ref_rate. If no fetal DNA is detected, you are not expected to see 2, 3, 5 or 6; alternatively, it can be said that the separation is equal to zero and therefore 1 and 2 are on top of each other, as are 3, 4 and 5, and also 6 and 7, see, for example, Figure 17, POOL1_BC7_ref_rate. Note that the fetal fraction for Figure 17, POOL1_BC1_ref_rate is approximately 25%.
Experiment 5 [000515] Most DNA amplification and measurement methods will produce some allelic bias, where the two alleles that are normally found in a locus are detected with intensities or counts that are not representative of the actual amounts of alleles in the sample. DNA. For example, for a single individual, in a heterozygous locus, it is expected to see a 1: 1 ratio of the two alleles, which is the theoretical ratio expected for a heterozygous locus; however, due to the allelic bias, you can see 55:45 or even 60:40. It is also noted that in the context of sequencing, if the reading depth is low, then simple stochastic noise could result in significant allelic bias. In one modality, it is possible to model the behavior of each SNP in such a way that if a consistent bias is observed for particular alleles, that bias can be corrected. Figure 18 shows the fraction of data that can be explained by binomial variance, before and after bias correction. In Figure 18, the stars represent the allelic bias seen in raw sequence data for the 800-plex experiment; the circles represent the allelic bias after correction. Note that if there were no allelic bias, the data would be expected to fall along the line x = y. A similar set of data that was produced by DNA amplification using directed 150-plex amplification produced data that fell very close to the 1: 1 line after bias correction.
Experiment 6 [000516] Universal DNA amplification using adapters linked with specific primers for the adapter markers, in which primer pairing and extension times are limited to a few minutes, has the effect of enriching the ratio of DNA strands shorter. Most library protocols designed to create DNA libraries suitable for sequencing contain such a step, and exemplified protocols are published and well known to those skilled in the art. In some embodiments of the invention, adapters with a universal marker are linked to plasma DNA, and amplified using primers specific to the adapter marker. In some modalities, the universal marker can be the same marker used for sequencing, it can be a universal marker only for PCR amplification, or it can be a set of markers. Since fetal DNA is typically short in nature, while maternal DNA can be both short and long in nature, this method has the effect of enriching the proportion of fetal DNA in the mixture. Free DNA, despite being the DNA of apoptotic cells, and which contains both fetal and maternal DNA, is short - mostly at 200 bp. Cellular DNA released by cell lysis, a common phenomenon after phlebotomy, is usually almost exclusively maternal, and is also quite long - mostly above 500 bp. Therefore, blood samples taken for more than a few minutes will contain a mixture of short DNA (fetal + maternal) and longer DNA (maternal). Performing a universal amplification with relatively short extension times in maternal plasma, followed by targeted amplification, will tend to increase the relative proportion of fetal DNA compared to the plasma that was amplified using directed amplification alone. This can be seen in Figure 19, which shows the fetal percentage measured when input is plasma DNA (vertical axis) versus fetal percentage measured when input DNA is plasma DNA that had a library prepared using the ILLUMINA GAIIx library preparation protocol . All points are below the line, indicating that the library preparation step enriches the fraction of DNA that is of fetal origin. Two plasma samples that were red, indicating hemolysis and then that there would be an increased amount of long-term maternal DNA present in cell lysis, show a particularly significant enrichment of the fetal fraction when library preparation is performed before targeted amplification. The method described here is particularly useful in cases where there is hemolysis or some other situation has occurred where cells comprising relatively long strands of contaminating DNA have been lysed, contaminating the mixed sample of short DNA with long DNA. Typically, the relatively short pairing and extension times are between 30 seconds and 2 minutes, although they can be as short as 5 or 10 seconds or less, or 5 or 10 minutes.
Experiment 7 [000517] The following protocol was used for the amplification of 1,200-plex of DNA isolated from the maternal plasma of a euploid pregnancy and also of genomic DNA from a triploidy cell line 21 using a direct PCR protocol, and also a semi-nested approach. The library preparation and amplification involved a single blunt tube termination followed by tail A. The adapter connection was performed using a modification of the connection kit found in the AGILENT SURESELECT kit, and PCR was performed for 7 cycles. In the target primer group, there were 550 assays for SNPs on chromosome 21, and 325 assays for SNPs on each of chromosomes 1 and X. Both protocols involved 15 cycles of STA (95 ° C for 30 s, 72 ° C for 1 min; 60 ° C for 4 min, 65 ° C for 30 s, 72 ° C for 30 s) using a concentration of 16 nM of initiator. The semi-aligned PCR protocol involved a second amplification of 15 cycles of STA (95 ° C for 30 s, 72 ° C for 1 min, 60 ° C for 4 min, 65 ° C for 30 s, 72 ° C for 30 s) , using an internal direct marker concentration of 29 nM, and a reverse marker concentration of 1 µM or 0.1 µM. The DNA was then sequenced with an ILLUMINA IIGAX sequencer. For the direct PCR protocol, 73% of the readings map to the genome; for the semi-lined protocol, 97.2% of the sequence readings map to the genome. Thus, the semi-lined protocol results in approximately 30% more information, presumably due essentially to the elimination of primers that are more likely to cause primer dimers.
[000518] The depth of the reading variability tends to be higher when using the semi-nested protocol than when using the direct PCR protocol (see Figure 20), where diamonds refer to the reading depth for loci executed with the semi-nested protocol, and the squares refer to the reading depth for loci executed without nesting. SNPs are arranged by reading depth for diamonds, so all diamonds are on a curved line, while the squares appear to be loosely correlated; SNP arrangements are arbitrary, and it is the height of the dot that indicates the depth of reading, rather than its location from left to right.
[000519] In some modalities, the methods described here can achieve excellent reading depth variances (DOR). For example, in one version of this experiment (Figure 21) using 1200-plex direct PCR amplification of genomic DNA, of the 1,200 assays, 1,186 assays had a DOR greater than 10; the average reading depth was 400; 1,063 trials (88.6%) had a reading depth between 200 and 800, and the ideal window, where the number of readings for each allele is high enough to obtain meaningful data, while the number of readings for each allele is not it is so high that the marginal use of these readings was particularly small. Only 12 alleles showed greater depth of reading with the highest in 1,035 readings. The DOR standard deviation was 290, the mean DOR was 453, the DOR coefficient of variance was 64%, there were 950,000 total readings, and 63.1% of the readings mapped to the genome. In another experiment (Figure 22), using a 1200-plex semi-nested protocol, DOR was higher. The DOR standard deviation was 583, the mean DOR was 630, the DOR coefficient of variance was 93%, there were 870,000 total readings, and 96.3% of the readings mapped to the genome. Note, in both these cases, the SNPs are arranged by the reading depth for the mother, so the curved line represents the maternal reading depth. The differentiation between the child and parent is not significant; it is only the trend that is significant for the purpose of this explanation.
Experiment 8 [000520] In one experiment, the 1200-plex semi-nested PCR protocol was used to amplify DNA from one cell and three cells. This experiment is relevant for the prenatal test for aneuploidy using fetal cells isolated from maternal blood, or for the pre-implantation genetic diagnosis using biopsied blastomers or samples of tropectoderm. There were three replicates of 1 to 3 cells from two individuals (46 XY and 47 XX + 21) per condition. The tests were directed to chromosomes 1, 21 and X. Three different methods of lysis were used: ARCTURUS, MPERv2 and alkaline lysis. Sequencing was performed by multiplexing 48 samples in a sequencing plan. The algorithm returned correct ploidy determinations for each of the three chromosomes, and for each of the replicates.
Experiment 9 [000521] In one experiment, four samples of maternal plasma were prepared and amplified using a 9,600-plex hemianigned protocol. The samples were prepared as follows: up to 40 ml of maternal blood were centrifuged to isolate the leukocyte and plasma cream. Genomic DNA in maternal DNA was prepared from leukocyte cream and paternal DNA was prepared from a blood sample or saliva sample. Cell-free DNA in maternal plasma was isolated using the QIAGEN CIRCULATING NUCLEIC ACID kit and eluted in 45 μL TE buffer according to the manufacturer's instructions. Universal binding adapters were attached to the end of each molecule of 35 μL of purified plasma DNA and the libraries were amplified for 7 cycles using adapter specific primers. The libraries were purified with AGENCOURT AMPURE beads and eluted in 50 µL of water.
[000522] 3 uL of DNA was amplified with 15 cycles of STA (95 ° C for 10 minutes for initial polymerase activation, then 15 cycles of 95 ° C for 30 s; 72 ° C for 10 s, 65 ° C for 1 min, 60 ° C for 8 min, 65 ° C for 3 minutes and 72 ° C for 30 s, and a final extension at 72 ° C for 2 min) using 14.5 nM primer concentration of 9600 labeled reverse primers target specific and 500 nM library adapter specific direct primer.
[000523] The hemianigned PCR protocol involved a second amplification of a dilution of the first STAs product by 15 cycles of STA (95 ° C for 10 minutes for initial polymerase activation, followed by 15 cycles of 95 ° C for 30 s, 65 ° C for 1 min; 60 ° C for 5 min, 65 ° C for 5 min and 72 ° C for 30 s, and a final extension at 72 ° C for 2 min) using the reverse marker concentration of 1,000 nM and a concentration of 16.6 nM for each of the 9,600 target specific direct primers.
[000524] An aliquot of the STA products was then amplified by standard PCR for 10 cycles with 1 µM direct marker specific reverse and barcode reverse primers to generate barcode sequencing libraries. An aliquot from each library was mixed with libraries of different bar codes and purified using a centrifuge column.
[000525] Thus, 9,600 initiators were used in single-cavity reactions; the primers were designed to target SNPs found on chromosomes 1, 2, 13, 18, 21, X and Y. The amplicons were then sequenced using an ILLUMINA GAIIX sequencer. Per sample, approximately 3.9 million readings were generated by the sequencer, with 3.7 million readings mapping to the genome (94%), and of these, 2.9 million readings (74%) mapped to target SNPs with a average reading depth of 344 and median reading depth of 255. The fetal fraction for the four samples was 9.9%, 18.9%, 16.3% and 21.2%.
[000526] The relevant maternal and paternal genomic DNA samples amplified using a 9,600-plex semi-nested protocol and sequenced. The semi-nested protocol is different in that it applies 9,600 external forward and reverse primers marked at 7.3 nM in the first STA. The thermocycling conditions and the composition of the second STA, and the barcode PCR were the same for the hemianigned protocol.
[000527] The sequencing data were analyzed using computer-based methods described here and the ploidy status was determined on six chromosomes for fetuses whose DNA was present in the 4 samples of maternal plasma. Ploidy determinations for all 28 chromosomes in the set were determined correctly with confidence levels above 99.2%, except for one chromosome that was determined correctly, but with 83% confidence.
[000528] Figure 23 shows the reading depth of the 9,600-plex half-line approach along with the reading depth of the 1,200-plex half-line approach described in Experiment 7, although the number of SNPs with a reading depth greater than 100, greater than 200 and greater than 400 were significantly higher than in the 1200-plex protocol. The number of readings in the 90th percentile can be divided by the number of readings in the 10th percentile to provide a dimensionless metric that is indicative of the uniformity of the reading depth; the lower the number, the more uniform (narrow) the reading depth. The average ratio of 90 ° percentile / 10 ° percentile is 11.5 for the method performed in Experiment 9, while it is 5.6 for the method performed in Experiment 7. A narrower reading depth for a given protocol plexity it is best for sequencing effectiveness, as less sequence readings are needed to ensure that a certain percentage of readings is above a limit on the number of readings.
Experiment 10 [000529] In an experiment, four samples of maternal plasma were prepared and amplified using a semi-nested 9,600-plex protocol. The details of Experiment 10 were very similar to Experiment 9, the exception being the nesting protocol, and including the identity of the four samples. Ploidy determinations for all 28 chromosomes in the set were performed correctly with confidence levels above 99.7%. 7.6 million (97%) of readings mapped to the genome, and 6.3 million (80%) of readings mapped to the target SNPs. The average reading depth is 751, and the median reading depth was 396.
Experiment 11 [000530] In one experiment, three samples of maternal plasma were divided into five equal portions, and each portion was amplified using either 2,400 multiplexed primers (four portions) or 1,200 multiplexed primers (one portion) and amplified using a semi-nested protocol, for a total of 10,800 initiators. After amplification, the portions were assembled for sequencing. Details of Experiment 11 were very similar to Experiment 9, the exception being the nesting protocol, and the division and grouping approach. Ploidy determinations for all 21 chromosomes in the set were made correctly with confidence levels above 99.7%, with the exception of a missed determination where confidence was 83%. 3.4 million readings mapped to target SNPs, the average reading depth was 404 and the median reading depth was 258.
Experiment 12 [000531] In one experiment, four samples of maternal plasma were divided into four equal portions, and each portion was amplified using 2,400 multiplexed primers and amplified using a semi-nested protocol, for a total of 9,600 primers. After amplification, the portions were assembled for sequencing. Details of Experiment 12, were very similar to Experiment 9, the exception being the nesting protocol, and the division and grouping approach. Ploidy determinations for all 28 chromosomes in the set were made correctly with confidence levels above 97%, with the exception of a lost determination where the confidence was 78%. 4.5 million readings mapped to target SNPs, the average reading depth was 535 and the median reading depth was 412.
Experiment 13 [000532] In one experiment, four samples of maternal plasma were prepared and amplified using a 9,600-plex tri-hemianigned protocol, for a total of 9,600 primers. Details of Experiment 12 were very similar to Experiment 9, the exception being the nesting protocol which involved three amplification steps, the three steps involved 15, 10 and 15 STA cycles, respectively. Ploidy determinations for 27 out of 28 chromosomes in the set were made correctly with trusts above 99.9%, with the exception of one that was made correctly with 94.6%, and a lost determination with a confidence of 80.8%. 3.5 million readings mapped to target SNPs, the average reading depth was 414 and the median reading depth was 249. Experiment 14 [000533] In one experiment, 45 sets of cells were amplified using a 1200- plex, sequenced, and ploidy determinations were made on three chromosomes. It is noted that this experiment aims to simulate the conditions to carry out the pre-implantation genetic diagnosis in biopsies of a single cell of 3-day embryos, or biopsies of tropectoderm from 5-day embryos. 15 single individual cells and 30 sets of three cells were placed in 45 individual reaction tubes, for a total of 45 reactions where each reaction contained cells from only one cell line, but the different reactions contained cells from different cell lines. The cells were prepared in 5 µl of wash buffer and lysed by adding 5 µl of ARCTURUS PICOPURE lysis buffer (APPLIED BIOSYSTEMS) and incubating at 56 ° C for 20 min, 95 ° C for 10 min.
[000534] Single cell / three cell DNA was amplified with 25 cycles of STA (95 ° C for 10 minutes for initial polymerase activation, then 25 cycles of 95 ° C for 30 s; 72 ° C for 10 s , 65 ° C for 1 min and 60 ° C for 8 min; 65 ° C for 3 min and 72 ° C for 30 s, and a final extension at 72 ° C for 2 min) using the 50 nM primer concentration 1,200 marked reverse primers and targeted specific forward primers.
[000535] The semi-aligned PCR protocol involved three second parallel amplifications of a dilution of the first STA product by 20 cycles of STA (95 ° C for 10 minutes for the activation of the initial polymerase, followed by 15 cycles of 95 ° C for 30 s, 65 ° C for 1 min, 60 ° C for 5 min, 65 ° C for 5 min and 72 ° C for 30s, and a final extension at 72 ° C for 2 min) using a specific marker reverse primer concentration of 1,000 nM, and a concentration of 60 nM for each of the 400 specific nested direct nested primers. In the three parallel reactions of 400-Plex, the total of 1,200 targets amplified in the first STA were thus amplified.
[000536] An aliquot of the STA products was then amplified by standard PCR for 15 cycles with 1 µM of barcode reverse primers and specific marker direct primers to generate barcode sequencing libraries. An aliquot from each library was mixed with libraries of different bar codes and purified using a centrifuge column.
[000537] Thus, 1,200 primers were used in single cell reactions, the primers were designed to target SNPs found on chromosomes 1, 21 and X. The amplicons were then sequenced using an ILLUMINA GAIIX sequencer. For example, approximately 3.9 million readings were generated by the sequencer, with 500,000 to 800,000 million readings mapping to the genome (74% to 94% of all readings per sample).
[000538] The relevant maternal and paternal genomic DNA samples from cell lines were analyzed using the same group of 1200-plex semi-aligned assays with a similar protocol with fewer cycles and according to 1200-Plex STA, and sequenced.
[000539] The sequencing data were analyzed using computer-based methods described here and the ploidy status was determined on the three chromosomes for the samples.
[000540] Figure 24 shows the normalized depth of reading ratios (vertical axis) for six samples on three chromosomes (1 = chrom 1, 2 = chrom 21, 3 = X chrom). The ratios were adjusted to equal the number of readings mapping to that chromosome, normalized, and divided by the number of readings mapping to that weighted chromosome over three wells, each comprising three 46XY cells. The three sets of data points corresponding to the 46XY reactions are expected to have 1: 1 ratios. The three sets of data points corresponding to cells 47XX + 21 are expected to have ratios of 1: 1 for chromosome 1, 1.5: 1 for chromosome 21, and 2: 1 for X chromosome.
[000541] Figure 25 shows allele relationships for three chromosomes (1, 21, X) for three reactions. The reaction at the bottom left shows a reaction in three 46XY cells. The left region is the allele relationship for chromosome 1, the middle region is the allele relationship for chromosome 21, and the right region is allele relationship for chromosome X. For 46XY cells, for chromosome 1 it is expected to see ratios of 1, 0.5 and 0, corresponding to SNP AA, AB and BB genotypes. For 46XY cells, chromosome 21 is expected to see ratios of 1, 0.5 and 0, corresponding to SNP AA, AB and BB genotypes. For 46XY cells, for the X chromosome, we expect to see ratios of 1 and 0, corresponding to SNP genotypes A, and B. The reaction in the lower right corner shows a reaction in three 47XX + 21 cells. The allele relationships are segregated by chromosome as in the lower left graph. For 47XX + 21 cells, chromosome 1 is expected to see ratios of 1, 0.5 and 0, corresponding to SNP AA, AB and BB genotypes. For cells 47XX + 21, chromosome 21 is expected to see ratios of 1, 0.67, 0.33 and 0, corresponding to SNP genotypes AAA, AAB, ABB and BBB. For 47XX + 21 cells, for the X chromosome, we expect to see ratios of 1, 0.5 and 0, corresponding to SNP AA, AB and BB genotypes. The graph in the upper right was made in a reaction that comprises 1 ng of genomic DNA from the 47XX + 21 cell line. Figure 26 shows the same graphs as Figure 25, but for reactions performed in only one cell. The graph on the left is a reaction that contained 47XX + 21 cells, and the graph on the right was for a reaction that contained a 46XX cell.
[000542] From the graphs shown in Figure 25 and Figure 26, it is visually clear that there are two groups of points for the chromosomes where it is expected to see relations of 0 and 1; three groups of points for the chromosomes where you expect to see ratios of 1, 0.5 and 0, and four groups of points for chromosomes, where you expect to see groups of 1, 0.67, 0.33 and 0. The PARENTAL algorithm SUPPORT was able to make correct determinations on all three chromosomes for all 45 reactions.
[000543] All patents, patent applications and references published here cited are hereby incorporated by reference. Although the methods of the present description have been described in conjunction with the specific modalities, it is understood that it is capable of further modifications. In addition, this application is intended to cover any of the variations, uses or adaptations of the methods of the present as they come within the known practice in the technique to which the methods of the present description belong, and as they fall within the scope of the claims in question. attachment.

权利要求:
Claims (10)
[1]
1. Method for determining the ploidy status of a chromosome in a gestating fetus, characterized by the fact that it comprises: isolating free DNA from a sample of maternal blood to obtain a prepared sample, in which the free DNA comprises maternal DNA from fetal mother and fetal DNA of the fetus; measuring the DNA sequence data in the sample prepared in a plurality of single nucleotide polymorphism (SNPs) on the chromosome in a high-throughput sequencer; calculate, on a computer, allele counts in the plurality of SPNs from the DNA measurements made in the prepared sample; create, on a computer, a plurality of ploidy hypotheses, each belonging to a possible ploidy state other than the chromosome, in which the ploidy state hypotheses are coincident with the maternal or paternal trisomy, not coinciding with the maternal or paternal trisomy, and trisomy which is a combination of coincident and non-coincident due to crosses, coincident trisomy being when a child inherits two copies of the identical chromosome segment from a parent and non-coincident trisomy being when the child inherits a copy of each homologous chromosome segment from a parent; build, on a computer, a joint distribution model for the expected allele counts on each of the SNPs on the chromosome for each ploidy hypothesis; and determine, on a computer, a relative probability of each ploidy hypothesis using the joint distribution model and the allele counts measured in the prepared sample; where the joint distribution model comprises: calculating the total probability for the chromosome for each hypothesis such as: where for each individual SNP on the chromosome: where D is allele counts, and H is ploidy state hypotheses, and N is the number total SNPs, and LIK (D | E, 1: N) is the probability of ending in hypotheses E for SNPs 1: N, and E is the hypotheses of the last SNP, and E and (Hm, Hu), and Hm is the coincident trisomy hypothesis, and Hu is the non-coincident trisomy hypothesis, ei is an individual SNP, ei = from 1 to N, and pc (i) is a crossover probability between SNP i-1 and SNP I, and ~ E is the hypothesis other than E (not E), and determine the fetus' ploidy state by selecting the ploidy state corresponding to the hypothesis with the highest probability.
[2]
2. Method according to claim 1, characterized by the fact that the DNA in the first sample originates from maternal plasma.
[3]
Method according to claim 1, characterized by the fact that the DNA in the first sample was preferably enriched with a plurality of polymorphic loci, wherein the preferential enrichment of DNA in the plurality of polymorphic loci comprises: obtaining a plurality of probes from Link-mediated PCR in which each PCR probe targets one of the polymorphic loci, and in which the upstream and downstream PCR probes are designed to hybridize to a DNA region on a DNA strand that is separated from the polymorphic site of the locus for a small number of bases, where the small number is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 to 25, 26 to 30, 31 to 60; hybridize the PCR probes mediated by binding to the DNA of the first sample; fill the space between the binding-mediated PCR probe ends using DNA polymerase; connect the connection-mediated PCR probes; and amplifying the ligation-mediated PCR probes.
[4]
4. Method according to claim 1, characterized by the fact that the DNA in the first sample was preferably enriched with a plurality of polymorphic loci, and in which the preferential enrichment of DNA in the plurality of polymorphic loci comprises: obtaining a plurality of primers internal direct, where each primer targets one of the polymorphic loci, and where the 3 'end of each of the internal direct primers is designed to hybridize with a region of DNA upstream of the polymorphic site of the locus, and separated from the polymorphic site by a small number of bases, where the small number is 6 to 10 base pairs; optionally obtaining a plurality of internal reverse primers, where each primer targets one of the polymorphic loci, and where the 3 'end of each of the internal reverse primers is designed to hybridize to a region of DNA upstream from the polymorphic site of the locus, and separated from the polymorphic site by a small number of bases, where the small number is 6 to 10 base pairs; hybridize the internal primers with the DNA, and amplify the DNA using the polymerase chain reaction to form amplicons.
[5]
5. Method according to claim 4, characterized by the fact that it further comprises: obtaining a plurality of external direct primers in which each primer targets one of the polymorphic loci, and in which each of the external direct primers is designed to hybridize with the DNA region upstream of the corresponding internal direct primer; optionally obtaining a plurality of external reverse primers where each primer targets one of the polymorphic loci, and where each of the external reverse primers is designed to hybridize to the DNA region immediately downstream of the corresponding internal reverse primer; hybridize the first primers with DNA; and amplify the DNA using the polymerase chain reaction.
[6]
6. Method according to claim 4, characterized by the fact that it further comprises: obtaining a plurality of external reverse primers in which each primer targets one of the polymorphic loci, and in which each of the external reverse primers is designed to hybridize with the DNA region immediately downstream of the corresponding internal reverse primer; optionally obtaining a plurality of external direct primers where each primer targets one of the polymorphic loci, and where each of the external direct primers is designed to hybridize to the DNA region upstream of the corresponding internal direct primer; hybridize the first primers with DNA; and amplify the DNA using the polymerase chain reaction.
[7]
7. Method according to claim 4, characterized by the fact that: (a) the preparation of the first sample also comprises: attaching universal adapters to the DNA in the first sample; and amplify the DNA in the first sample, using the polymerase chain reaction; (b) DNA amplification is done in one or a plurality of individual reaction volumes, where each individual reaction volume contains more than 500 different pairs of forward and reverse primers; or (c) the internal primers are selected by identifying pairs of primers likely to form duplexes of undesirable primers and removing from the plurality of primers at least one of the pairs of primers identified as likely to form duplexes of undesirable primers.
[8]
8. Method according to claim 1, characterized by the fact that it still comprises obtaining genotypic data in the plurality of polymorphic loci of one or both parents of the fetus, in which: (a) the construction of a joint distribution model for the allele count probabilities expected from the plurality of polymorphic loci on the chromosome or chromosome segment are made using the genetic data obtained from one or both parents; or (b) the first sample is isolated from maternal plasma, and where obtaining genotypic data from the mother is done by estimating maternal genotypic data from DNA measurements made on the prepared sample.
[9]
9. Method according to claim 1, characterized by the fact that the step of determining the ploidy state of the fetus further comprises: combining the relative probabilities of each of the ploidy hypotheses determined using the joint distribution model and the probabilities of allele count with relative probabilities for each of the ploidy hypotheses that are calculated using one or more statistical techniques selected from the group consisting of a reading count analysis, comparing heterozygosity rates, a statistic that is only available when parental genetic information is used, the probability of genotype signals normalized for certain parental contexts, a statistic that is calculated using an estimated fetal fraction of the first sample or the prepared sample, and combinations thereof.
[10]
10. Method according to claim 8, characterized by the fact that the maternal genetic data is estimated from genetic measurements made in the maternal plasma that comprises a mixture of maternal and fetal DNA.

类似技术:

公开号 | 公开日 | 专利标题

US20210355536A1|2021-11-18|Methods for non-invasive prenatal ploidy calling

AU2012385961B2|2017-04-13|Highly multiplex PCR methods and compositions

US20190284623A1|2019-09-19|Methods for non-invasive prenatal ploidy calling

US20190300950A1|2019-10-03|Methods for non-invasive prenatal ploidy calling

US20190264277A1|2019-08-29|Methods for non-invasive prenatal ploidy calling

US20190309358A1|2019-10-10|Methods for non-invasive prenatal ploidy calling

US20190323076A1|2019-10-24|Methods for non-invasive prenatal ploidy calling

US20220073979A1|2022-03-10|Methods for non-invasive prenatal ploidy calling

ES2622088T3|2017-07-05|Non-invasive methods for determining prenatal ploidy status

ES2625079T3|2017-07-18|Compositions and methods by highly multiplexed PCR

同族专利:

公开号 | 公开日

AU2011358564B2|2017-06-22|

US20190203290A1|2019-07-04|

US9163282B2|2015-10-20|

JP2014507141A|2014-03-27|

US20150051087A1|2015-02-19|

US20190211392A1|2019-07-11|

JP6153874B2|2017-06-28|

US20190211391A1|2019-07-11|

US20190249241A1|2019-08-15|

US20130178373A1|2013-07-11|

US20200190573A1|2020-06-18|

US20190256908A1|2019-08-22|

US20120270212A1|2012-10-25|

US20210355536A1|2021-11-18|

US20190211393A1|2019-07-11|

IL227842D0|2013-09-30|

AU2011358564B9|2017-07-13|

CA2824387C|2019-09-24|

EP2673729A4|2015-09-30|

AU2011358564A9|2017-07-13|

CN103608818B|2017-12-08|

US20180025109A1|2018-01-25|

RU2671980C2|2018-11-08|

EP2673729B1|2018-10-17|

BR112013020220A2|2018-10-09|

US20190360036A1|2019-11-28|

CN103608818A|2014-02-26|

US20190256906A1|2019-08-22|

CA2824387A1|2012-08-16|

AU2011358564A2|2013-09-12|

EP2673729A1|2013-12-18|

US10017812B2|2018-07-10|

US20190256909A1|2019-08-22|

RU2013141237A|2015-03-20|

US10174369B2|2019-01-08|

US20180201995A1|2018-07-19|

WO2012108920A1|2012-08-16|

US20170242960A1|2017-08-24|

AU2011358564A1|2013-09-05|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US6582908B2|1990-12-06|2003-06-24|Affymetrix, Inc.|Oligonucleotides|

US5153117A|1990-03-27|1992-10-06|Genetype A.G.|Fetal cell recovery method|

IL103935D0|1991-12-04|1993-05-13|Du Pont|Method for the identification of microorganisms by the utilization of directed and arbitrary dna amplification|

GB9305984D0|1993-03-23|1993-05-12|Royal Free Hosp School Med|Predictive assay|

WO1995006137A1|1993-08-27|1995-03-02|Australian Red Cross Society|Detection of genes|

US5716776A|1994-03-04|1998-02-10|Mark H. Bogart|Enrichment by preferential mitosis of fetal lymphocytes from a maternal blood sample|

US6025128A|1994-09-29|2000-02-15|The University Of Tulsa|Prediction of prostate cancer progression by analysis of selected predictive parameters|

US6479235B1|1994-09-30|2002-11-12|Promega Corporation|Multiplex amplification of short tandem repeat loci|

US6720140B1|1995-06-07|2004-04-13|Invitrogen Corporation|Recombinational cloning using engineered recombination sites|

US5733729A|1995-09-14|1998-03-31|Affymetrix, Inc.|Computer-aided probability base calling for arrays of nucleic acid probes on chips|

US6852487B1|1996-02-09|2005-02-08|Cornell Research Foundation, Inc.|Detection of nucleic acid sequence differences using the ligase detection reaction with addressable arrays|

US6108635A|1996-05-22|2000-08-22|Interleukin Genetics, Inc.|Integrated disease information system|

US6300077B1|1996-08-14|2001-10-09|Exact Sciences Corporation|Methods for the detection of nucleic acids|

US6100029A|1996-08-14|2000-08-08|Exact Laboratories, Inc.|Methods for the detection of chromosomal aberrations|

US6833242B2|1997-09-23|2004-12-21|California Institute Of Technology|Methods for detecting and sorting polynucleotides based on size|

US6221654B1|1996-09-25|2001-04-24|California Institute Of Technology|Method and apparatus for analysis and sorting of polynucleotides based on size|

US5860917A|1997-01-15|1999-01-19|Chiron Corporation|Method and apparatus for predicting therapeutic outcomes|

US5824467A|1997-02-25|1998-10-20|Celtrix Pharmaceuticals|Methods for predicting drug response|

GB9704444D0|1997-03-04|1997-04-23|Isis Innovation|Non-invasive prenatal diagnosis|

ES2230631T3|1997-03-20|2005-05-01|F. Hoffmann-La Roche Ag|MODIFIED PRIMERS.|

US6143496A|1997-04-17|2000-11-07|Cytonix Corporation|Method of sampling, amplifying and quantifying segment of nucleic acid, polymerase chain reaction assembly having nanoliter-sized sample chambers, and method of filling assembly|

US5994148A|1997-06-23|1999-11-30|The Regents Of University Of California|Method of predicting and enhancing success of IVF/ET pregnancy|

US6013444A|1997-09-18|2000-01-11|Oligotrail, Llc|DNA bracketing locus compatible standards for electrophoresis|

US6180349B1|1999-05-18|2001-01-30|The Regents Of The University Of California|Quantitative PCR method to enumerate DNA copy number|

US7058517B1|1999-06-25|2006-06-06|Genaissance Pharmaceuticals, Inc.|Methods for obtaining and using haplotype data|

US6964847B1|1999-07-14|2005-11-15|Packard Biosciences Company|Derivative nucleic acids and uses thereof|

GB9917307D0|1999-07-23|1999-09-22|Sec Dep Of The Home Department|Improvements in and relating to analysis of DNA|

US6440706B1|1999-08-02|2002-08-27|Johns Hopkins University|Digital amplification|

US6251604B1|1999-08-13|2001-06-26|Genopsys, Inc.|Random mutagenesis and amplification of nucleic acid|

US7510834B2|2000-04-13|2009-03-31|Hidetoshi Inoko|Gene mapping method using microsatellite genetic polymorphism markers|

GB0009179D0|2000-04-13|2000-05-31|Imp College Innovations Ltd|Non-invasive prenatal diagnosis|

EP1290225A4|2000-05-20|2004-09-15|Univ Michigan|Method of producing a dna library using positional amplification|

AU6481101A|2000-05-23|2001-12-03|Variagenics Inc|Methods for genetic analysis of dna to detect sequence variances|

EP1356088A2|2000-06-07|2003-10-29|Baylor College of Medicine|Compositions and methods for array-based nucleic acid hybridization|

US7058616B1|2000-06-08|2006-06-06|Virco Bvba|Method and system for predicting resistance of a disease to a therapeutic agent using a neural network|

GB0016742D0|2000-07-10|2000-08-30|Simeg Limited|Diagnostic method|

US20020107640A1|2000-11-14|2002-08-08|Ideker Trey E.|Methods for determining the true signal of an analyte|

WO2002055985A2|2000-11-15|2002-07-18|Roche Diagnostics Corp|Methods and reagents for identifying rare fetal cells in the material circulation|

US7218764B2|2000-12-04|2007-05-15|Cytokinetics, Inc.|Ploidy classification method|

US20030009295A1|2001-03-14|2003-01-09|Victor Markowitz|System and method for retrieving and using gene expression data from multiple sources|

US6489135B1|2001-04-17|2002-12-03|Atairgintechnologies, Inc.|Determination of biological characteristics of embryos fertilized in vitro by assaying for bioactive lipids in culture media|

FR2824144B1|2001-04-30|2004-09-17|Metagenex S A R L|PRENATAL DIAGNOSIS METHOD ON FATAL CELL ISOLATED FROM MATERNAL BLOOD|

US7392199B2|2001-05-01|2008-06-24|Quest Diagnostics Investments Incorporated|Diagnosing inapparent diseases from common clinical tests using Bayesian analysis|

US20040229231A1|2002-05-28|2004-11-18|Frudakis Tony N.|Compositions and methods for inferring ancestry|

WO2003010537A1|2001-07-24|2003-02-06|Curagen Corporation|Family based tests of association using pooled dna and snp markers|

US7459273B2|2002-10-04|2008-12-02|Affymetrix, Inc.|Methods for genotyping selected polymorphism|

US6958211B2|2001-08-08|2005-10-25|Tibotech Bvba|Methods of assessing HIV integrase inhibitor therapy|

US6807491B2|2001-08-30|2004-10-19|Hewlett-Packard Development Company, L.P.|Method and apparatus for combining gene predictions using bayesian networks|

US8986944B2|2001-10-11|2015-03-24|Aviva Biosciences Corporation|Methods and compositions for separating rare cells from fluid samples|

WO2003031646A1|2001-10-12|2003-04-17|The University Of Queensland|Multiple genetic marker selection and amplification|

US7297485B2|2001-10-15|2007-11-20|Qiagen Gmbh|Method for nucleic acid amplification that results in low amplification bias|

US20030119004A1|2001-12-05|2003-06-26|Wenz H. Michael|Methods for quantitating nucleic acids using coupled ligation and amplification|

US20050214758A1|2001-12-11|2005-09-29|Netech Inc.|Blood cell separating system|

US20030211522A1|2002-01-18|2003-11-13|Landes Gregory M.|Methods for fetal DNA detection and allele quantitation|

EP1483720A1|2002-02-01|2004-12-08|Rosetta Inpharmactis LLC.|Computer systems and methods for identifying genes and determining pathways associated with traits|

US7442506B2|2002-05-08|2008-10-28|Ravgen, Inc.|Methods for detection of genetic disorders|

JP2006508632A|2002-03-01|2006-03-16|ラブジェン，インコーポレイテッド|Methods for detecting genetic diseases|

US20070178478A1|2002-05-08|2007-08-02|Dhallan Ravinder S|Methods for detection of genetic disorders|

US6977162B2|2002-03-01|2005-12-20|Ravgen, Inc.|Rapid analysis of variations in a genome|

US7727720B2|2002-05-08|2010-06-01|Ravgen, Inc.|Methods for detection of genetic disorders|

US20060229823A1|2002-03-28|2006-10-12|Affymetrix, Inc.|Methods and computer software products for analyzing genotyping data|

WO2003093426A2|2002-05-02|2003-11-13|University Of North Carolina At Chapel Hill|In vitro mutagenesis, phenotyping, and gene mapping|

EP1532453B1|2002-05-31|2013-08-21|Genetic Technologies Limited|Maternal antibodies as fetal cell markers to identify and enrich fetal cells from maternal blood|

AU2003243475A1|2002-06-13|2003-12-31|New York University|Early noninvasive prenatal test for aneuploidies and heritable conditions|

US20050009069A1|2002-06-25|2005-01-13|Affymetrix, Inc.|Computer software products for analyzing genotyping|

WO2004033649A2|2002-10-07|2004-04-22|University Of Medicine And Dentistry Of New Jersey|High throughput multiplex dna sequence amplifications|

EP1578994A2|2002-11-11|2005-09-28|Affymetrix, Inc.|Methods for identifying dna copy number changes|

EP1587946B1|2003-01-17|2009-07-08|The Trustees Of Boston University|Haplotype analysis|

WO2004065628A1|2003-01-21|2004-08-05|Guoliang Fu|Quantitative multiplex detection of nucleic acids|

CN101128601B|2003-01-29|2011-06-08|454生命科学公司|Methods of amplifying and sequencing nucleic acids|

EP1606417A2|2003-03-07|2005-12-21|Rubicon Genomics Inc.|In vitro dna immortalization and whole genome amplification using libraries generated from randomly fragmented dna|

US20040197832A1|2003-04-03|2004-10-07|Mor Research Applications Ltd.|Non-invasive prenatal genetic diagnosis using transcervical cells|

CA2525956A1|2003-05-28|2005-01-06|Pioneer Hi-Bred International, Inc.|Plant breeding method|

US20040259100A1|2003-06-20|2004-12-23|Illumina, Inc.|Methods and compositions for whole genome amplification and genotyping|

AU2003263660A1|2003-08-29|2005-03-16|Pantarhei Bioscience B.V.|Prenatal diagnosis of down syndrome by detection of fetal rna markers in maternal blood|

WO2005023091A2|2003-09-05|2005-03-17|The Trustees Of Boston University|Method for non-invasive prenatal diagnosis|

WO2005028674A2|2003-09-22|2005-03-31|Trisogen Biotechnology Limited Partnership|Methods and kits useful for detecting an alteration in a locus copy number|

EP2395111B1|2003-10-08|2015-05-13|Trustees of Boston University|Methods for prenatal diagnosis of chromosomal abnormalities|

CA2482097C|2003-10-13|2012-02-21|F. Hoffmann-La Roche Ag|Methods for isolating nucleic acids|

EP1524321B2|2003-10-16|2014-07-23|Sequenom, Inc.|Non-invasive detection of fetal genetic traits|

WO2005039389A2|2003-10-22|2005-05-06|454 Corporation|Sequence-based karyotyping|

WO2005044086A2|2003-10-30|2005-05-19|Tufts-New England Medical Center|Prenatal diagnosis using cell-free fetal dna in amniotic fluid|

US7892732B2|2004-01-12|2011-02-22|Roche Nimblegen, Inc.|Method of performing PCR amplification on a microarray|

US20100216153A1|2004-02-27|2010-08-26|Helicos Biosciences Corporation|Methods for detecting fetal nucleic acids and diagnosing fetal abnormalities|

US7035740B2|2004-03-24|2006-04-25|Illumina, Inc.|Artificial intelligence and global normalization methods for genotyping|

JP4437050B2|2004-03-26|2010-03-24|株式会社日立製作所|Diagnosis support system, diagnosis support method, and diagnosis support service providing method|

WO2005094363A2|2004-03-30|2005-10-13|New York University|System, method and software arrangement for bi-allele haplotype phasing|

CA2561830C|2004-03-31|2013-05-21|Adnagen Ag|Monoclonal antibodies with specificity for fetal erythroid cells|

US7414118B1|2004-04-14|2008-08-19|Applied Biosystems Inc.|Modified oligonucleotides and applications thereof|

US7468249B2|2004-05-05|2008-12-23|Biocept, Inc.|Detection of chromosomal disorders|

US7709194B2|2004-06-04|2010-05-04|The Chinese University Of Hong Kong|Marker for prenatal diagnosis and monitoring|

EP1819734A2|2004-06-14|2007-08-22|The Board Of Trustees Of The University Of Illinois|Antibodies binding to cd34+/cd36+ fetal but not to adult cells|

AU2007286734B2|2006-08-22|2011-06-16|The Government Of The United States Of America, As Represented By The Secretary Of The Navy|Design and selection of genetic targets for sequence resolved organism detection and identification|

WO2006002491A1|2004-07-06|2006-01-12|Genera Biosystems Pty Ltd|Method of detecting aneuploidy|

DE102004036285A1|2004-07-27|2006-02-16|Advalytix Ag|Method for determining the frequency of sequences of a sample|

EP1784508B1|2004-08-09|2012-10-03|Generation Biotech, LLC|Method for nucleic acid isolation and amplification|

CA2577741A1|2004-08-18|2006-03-02|Abbott Molecular, Inc.|Determining data quality and/or segmental aneusomy using a computer system|

US8024128B2|2004-09-07|2011-09-20|Gene Security Network, Inc.|System and method for improving clinical decisions by aggregating, validating and analysing genetic and phenotypic data|

US20060088574A1|2004-10-25|2006-04-27|Manning Paul B|Nutritional supplements|

US20060134662A1|2004-10-25|2006-06-22|Pratt Mark R|Method and system for genotyping samples in a normalized allelic space|

EP1828419A1|2004-11-17|2007-09-05|ReproCure, LLC|Methods of determining human egg competency|

US20070042384A1|2004-12-01|2007-02-22|Weiwei Li|Method for isolating and modifying DNA from blood and body fluids|

WO2006084391A1|2005-02-11|2006-08-17|Smartgene Gmbh|Computer-implemented method and computer-based system for validating dna sequencing data|

WO2006091979A2|2005-02-25|2006-08-31|The Regents Of The University Of California|Full karyotype single cell chromosome analysis|

US7618777B2|2005-03-16|2009-11-17|Agilent Technologies, Inc.|Composition and method for array hybridization|

EP1859050B1|2005-03-18|2012-10-24|The Chinese University Of Hong Kong|A method for the detection of chromosomal aneuploidies|

CA3007182A1|2005-03-18|2006-09-21|The Chinese University Of Hong Kong|Markers for prenatal diagnosis, monitoring or predicting preeclampsia|

AU2006226873B2|2005-03-24|2009-09-24|Zoragen Biotechnologies Llp|Nucleic acid detection|

US20070020640A1|2005-07-21|2007-01-25|Mccloskey Megan L|Molecular encoding of nucleic acid templates for PCR and other forms of sequence analysis|

US10081839B2|2005-07-29|2018-09-25|Natera, Inc|System and method for cleaning noisy genetic data and determining chromosome copy number|

US8532930B2|2005-11-26|2013-09-10|Natera, Inc.|Method for determining the number of copies of a chromosome in the genome of a target individual using genetic data from genetically related individuals|

US20070027636A1|2005-07-29|2007-02-01|Matthew Rabinowitz|System and method for using genetic, phentoypic and clinical data to make predictions for clinical or lifestyle decisions|

US8515679B2|2005-12-06|2013-08-20|Natera, Inc.|System and method for cleaning noisy genetic data and determining chromosome copy number|

EP1960929A4|2005-11-26|2009-01-28|Gene Security Network Llc|System and method for cleaning noisy genetic data and using data to make predictions|

US10083273B2|2005-07-29|2018-09-25|Natera, Inc.|System and method for cleaning noisy genetic data and determining chromosome copy number|

GB0522310D0|2005-11-01|2005-12-07|Solexa Ltd|Methods of preparing libraries of template polynucleotides|

GB0523276D0|2005-11-15|2005-12-21|London Bridge Fertility|Chromosomal analysis by molecular karyotyping|

US9424392B2|2005-11-26|2016-08-23|Natera, Inc.|System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals|

US20070178501A1|2005-12-06|2007-08-02|Matthew Rabinowitz|System and method for integrating and validating genotypic, phenotypic and medical information into a database according to a standardized ontology|

WO2007070482A2|2005-12-14|2007-06-21|Xueliang Xia|Microarray-based preimplantation genetic diagnosis of chromosomal abnormalities|

LT3002338T|2006-02-02|2019-10-25|Univ Leland Stanford Junior|Non-invasive fetal genetic screening by digital analysis|

WO2007092538A2|2006-02-07|2007-08-16|President And Fellows Of Harvard College|Methods for making nucleotide probes for sequencing and synthesis|

WO2007091064A1|2006-02-08|2007-08-16|Solexa Limited|End modification to prevent over-representation of fragments|

WO2007100911A2|2006-02-28|2007-09-07|University Of Louisville Research Foundation|Detecting fetal chromosomal abnormalities using tandem single nucleotide polymorphisms|

US20080038733A1|2006-03-28|2008-02-14|Baylor College Of Medicine|Screening for down syndrome|

WO2007121276A2|2006-04-12|2007-10-25|Biocept, Inc.|Enrichment of circulating fetal dna|

US7901884B2|2006-05-03|2011-03-08|The Chinese University Of Hong Kong|Markers for prenatal diagnosis and monitoring|

US7702468B2|2006-05-03|2010-04-20|Population Diagnostics, Inc.|Evaluating genetic disorders|

US8137912B2|2006-06-14|2012-03-20|The General Hospital Corporation|Methods for the diagnosis of fetal abnormalities|

EP2029779A4|2006-06-14|2010-01-20|Living Microsystems Inc|Use of highly parallel snp genotyping for fetal diagnosis|

EP2061801A4|2006-06-14|2009-11-11|Living Microsystems Inc|Diagnosis of fetal abnormalities by comparative genomic hybridization analysis|

CN108048549B|2006-06-14|2021-10-26|维里纳塔健康公司|Rare cell analysis using sample resolution and DNA tagging|

US20080124721A1|2006-06-14|2008-05-29|Martin Fuchs|Analysis of rare cell-enriched samples|

EP2589668A1|2006-06-14|2013-05-08|Verinata Health, Inc|Rare cell analysis using sample splitting and DNA tags|

EP2035540A2|2006-06-15|2009-03-18|Stratagene|System for isolating biomolecules from a sample|

WO2008019315A2|2006-08-04|2008-02-14|Ikonisys, Inc.|Improved pre-implantation genetic diagnosis test|

WO2008024473A2|2006-08-24|2008-02-28|University Of Massachusetts Medical School|Mapping of genomic interactions|

EP2064332B1|2006-09-14|2012-07-18|Ibis Biosciences, Inc.|Targeted whole genome amplification method for identification of pathogens|

US20080085836A1|2006-09-22|2008-04-10|Kearns William G|Method for genetic testing of human embryos for chromosome abnormalities, segregating genetic disorders with or without a known mutation and mitochondrial disorders following in vitro fertilization , embryo culture and embryo biopsy|

US20110039258A1|2006-10-16|2011-02-17|Celula Inc.|Methods and compositions for differential expansion of fetal cells in maternal blood and their use|

WO2008051928A2|2006-10-23|2008-05-02|The Salk Institute For Biological Studies|Target-oriented whole genome amplification of nucliec acids|

WO2008059578A1|2006-11-16|2008-05-22|Olympus Corporation|Multiplex pcr method|

WO2008081451A2|2007-01-03|2008-07-10|Monaliza Medical Ltd.|Methods and kits for analyzing genetic material of a fetus|

WO2008115497A2|2007-03-16|2008-09-25|Gene Security Network|System and method for cleaning noisy genetic data and determining chromsome copy number|

CN101849185A|2007-05-31|2010-09-29|加利福尼亚大学董事会|High specificity and high sensitivity detection based on steric hindrance & enzyme-related signal amplification|

WO2008157264A2|2007-06-15|2008-12-24|Sequenom, Inc.|Combined methods for the detection of chromosomal aneuploidy|

US20090023190A1|2007-06-20|2009-01-22|Kai Qin Lao|Sequence amplification with loopable primers|

WO2009009769A2|2007-07-11|2009-01-15|Artemis Health, Inc.|Diagnosis of fetal abnormalities using nucleated red blood cells|

EA017966B1|2007-07-23|2013-04-30|Те Чайниз Юниверсити Ов Гонгконг|Diagnosing fetal chromosomal aneuploidy using genomic sequencing|

US20100112590A1|2007-07-23|2010-05-06|The Chinese University Of Hong Kong|Diagnosing Fetal Chromosomal Aneuploidy Using Genomic Sequencing With Enrichment|

US20090053719A1|2007-08-03|2009-02-26|The Chinese University Of Hong Kong|Analysis of nucleic acids by digital pcr|

EP2191276B1|2007-08-03|2013-11-20|DKFZ Deutsches Krebsforschungszentrum|Method for prenatal diagnosis using exosomes and cd24 as a marker|

EP2195452B1|2007-08-29|2012-03-14|Sequenom, Inc.|Methods and compositions for universal size-specific polymerase chain reaction|

WO2009032779A2|2007-08-29|2009-03-12|Sequenom, Inc.|Methods and compositions for the size-specific seperation of nucleic acid from a sample|

US8748100B2|2007-08-30|2014-06-10|The Chinese University Of Hong Kong|Methods and kits for selectively amplifying, detecting or quantifying target DNA with specific end sequences|

AU2008295992B2|2007-09-07|2014-04-17|Fluidigm Corporation|Copy number variation determination, methods and systems|

WO2009042457A1|2007-09-21|2009-04-02|Streck, Inc.|Nucleic acid isolation in preserved whole blood|

WO2009036525A2|2007-09-21|2009-03-26|Katholieke Universiteit Leuven|Tools and methods for genetic tests using next generation sequencing|

AU2008329833B2|2007-11-30|2014-04-17|Global Life Sciences Solutions Usa Llc|Method for isolation of genomic DNA, RNA and proteins from a single sample|

EP2077337A1|2007-12-26|2009-07-08|Eppendorf Array Technologies SA|Amplification and detection composition, method and kit|

WO2009092035A2|2008-01-17|2009-07-23|Sequenom, Inc.|Methods and compositions for the analysis of biological molecules|

EP2245191A1|2008-01-17|2010-11-03|Sequenom, Inc.|Single molecule nucleic acid sequence analysis processes and compositions|

WO2009105531A1|2008-02-19|2009-08-27|Gene Security Network, Inc.|Methods for cell genotyping|

US20090221620A1|2008-02-20|2009-09-03|Celera Corporation|Gentic polymorphisms associated with stroke, methods of detection and uses thereof|

DK2271767T3|2008-04-03|2016-08-29|Cb Biotechnologies Inc|Amplikonredning-multiplex polymerase chain reaction for the amplification of multiple target|

WO2009146335A1|2008-05-27|2009-12-03|Gene Security Network, Inc.|Methods for embryo characterization and comparison|

CA2731991C|2008-08-04|2021-06-08|Gene Security Network, Inc.|Methods for allele calling and ploidy calling|

EP2326732A4|2008-08-26|2012-11-14|Fluidigm Corp|Assay methods for increased throughput of samples and/or targets|

DE102008045705A1|2008-09-04|2010-04-22|Macherey, Nagel Gmbh & Co. Kg Handelsgesellschaft|Method for obtaining short RNA and kit therefor|

US8586310B2|2008-09-05|2013-11-19|Washington University|Method for multiplexed nucleic acid patch polymerase chain reaction|

US20110172405A1|2008-09-17|2011-07-14|Ge Healthcare Bio-Sciences Corp.|Method for small rna isolation|

SG172345A1|2008-12-22|2011-07-28|Celula Inc|Methods and genotyping panels for detecting alleles, genomes, and transcriptomes|

US20100184069A1|2009-01-21|2010-07-22|Streck, Inc.|Preservation of fetal nucleic acids in maternal plasma|

WO2010088288A2|2009-01-28|2010-08-05|Fluidigm Corporation|Determination of copy number differences by amplification|

DK3290530T3|2009-02-18|2020-12-07|Streck Inc|PRESERVATION OF CELL-FREE NUCLEIC ACIDS|

WO2010115044A2|2009-04-02|2010-10-07|Fluidigm Corporation|Selective tagging of short nucleic acid fragments and selective protection of target sequences from degradation|

WO2010127186A1|2009-04-30|2010-11-04|Prognosys Biosciences, Inc.|Nucleic acid constructs and methods of use|

WO2013130848A1|2012-02-29|2013-09-06|Natera, Inc.|Informatics enhanced analysis of fetal samples subject to maternal contamination|

US20130196862A1|2009-07-17|2013-08-01|Natera, Inc.|Informatics Enhanced Analysis of Fetal Samples Subject to Maternal Contamination|

US10316362B2|2010-05-18|2019-06-11|Natera, Inc.|Methods for simultaneous amplification of target loci|

US8563242B2|2009-08-11|2013-10-22|The Chinese University Of Hong Kong|Method for detecting chromosomal aneuploidy|

CA2773186A1|2009-09-24|2011-03-31|Qiagen Gaithersburg, Inc.|Compositions, methods, and kits for isolating and analyzing nucleic acids using an anion exchange material|

EP2473638B1|2009-09-30|2017-08-09|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|

JP5540105B2|2009-11-05|2014-07-02|ザチャイニーズユニバーシティオブホンコン|Fetal genome analysis of maternal biological samples|

JP2013509883A|2009-11-06|2013-03-21|ザボードオブトラスティーズオブザリーランドスタンフォードジュニアユニバーシティ|Noninvasive diagnosis of graft rejection in organ transplant patients|

EP2504448B1|2009-11-25|2016-10-19|Bio-Rad Laboratories, Inc.|Methods and compositions for detecting genetic material|

WO2011072086A1|2009-12-08|2011-06-16|Hemaquest Pharmaceuticals, Inc.|Methods and low dose regimens for treating red blood cell disorders|

US9315857B2|2009-12-15|2016-04-19|Cellular Research, Inc.|Digital counting of individual molecules by stochastic attachment of diverse label-tags|

US8835358B2|2009-12-15|2014-09-16|Cellular Research, Inc.|Digital counting of individual molecules by stochastic attachment of diverse labels|

US9926593B2|2009-12-22|2018-03-27|Sequenom, Inc.|Processes and kits for identifying aneuploidy|

US8574842B2|2009-12-22|2013-11-05|The Board Of Trustees Of The Leland Stanford Junior University|Direct molecular diagnosis of fetal aneuploidy|

WO2011090556A1|2010-01-19|2011-07-28|Verinata Health, Inc.|Methods for determining fraction of fetal nucleic acid in maternal samples|

EP2875149B1|2012-07-20|2019-12-04|Verinata Health, Inc.|Detecting and classifying copy number variation in a cancer genome|

ES2534986T3|2010-01-19|2015-05-04|Verinata Health, Inc|Simultaneous determination of aneuploidy and fetal fraction|

US10388403B2|2010-01-19|2019-08-20|Verinata Health, Inc.|Analyzing copy number variation in the detection of cancer|

US20110312503A1|2010-01-23|2011-12-22|Artemis Health, Inc.|Methods of fetal abnormality detection|

CA2824387C|2011-02-09|2019-09-24|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|

AU2011255641A1|2010-05-18|2012-12-06|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|

WO2013052557A2|2011-10-03|2013-04-11|Natera, Inc.|Methods for preimplantation genetic diagnosis by sequencing|

EP3760730A1|2011-02-09|2021-01-06|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|

US20140206552A1|2010-05-18|2014-07-24|Natera, Inc.|Methods for preimplantation genetic diagnosis by sequencing|

RU2650790C2|2012-07-24|2018-04-17|Натера, Инк.|Highly multiplex pcr methods and compositions|

US20130123120A1|2010-05-18|2013-05-16|Natera, Inc.|Highly Multiplex PCR Methods and Compositions|

RU2620959C2|2010-12-22|2017-05-30|Натера, Инк.|Methods of noninvasive prenatal paternity determination|

US20110301854A1|2010-06-08|2011-12-08|Curry Bo U|Method of Determining Allele-Specific Copy Number of a SNP|

US20120034603A1|2010-08-06|2012-02-09|Tandem Diagnostics, Inc.|Ligation-based detection of genetic variants|

CN103534591B|2010-10-26|2016-04-06|利兰·斯坦福青年大学托管委员会|The Noninvasive fetus genetic screening undertaken by sequencing analysis|

CN103403182B|2010-11-30|2015-11-25|香港中文大学|The heredity relevant to cancer or the detection of molecular distortion|

US8877442B2|2010-12-07|2014-11-04|The Board Of Trustees Of The Leland Stanford Junior University|Non-invasive determination of fetal inheritance of parental haplotypes at the genome-wide scale|

WO2012083250A2|2010-12-17|2012-06-21|Celula, Inc.|Methods for screening and diagnosing genetic conditions|

US8700338B2|2011-01-25|2014-04-15|Ariosa Diagnosis, Inc.|Risk calculation for evaluation of fetal aneuploidy|

US20120190021A1|2011-01-25|2012-07-26|Aria Diagnostics, Inc.|Detection of genetic abnormalities|

GB2488358A|2011-02-25|2012-08-29|Univ Plymouth|Enrichment of foetal DNA in maternal plasma|

LT3078752T|2011-04-12|2018-11-26|Verinata Health, Inc.|Resolving genome fractions using polymorphism counts|

WO2012142531A2|2011-04-14|2012-10-18|Complete Genomics, Inc.|Processing and analysis of complex nucleic acid sequence data|

PL2697397T3|2011-04-15|2017-08-31|The Johns Hopkins University|Safe sequencing system|

EP3072977B1|2011-04-28|2018-09-19|Life Technologies Corporation|Methods and compositions for multiplex pcr|

EP2546361B1|2011-07-11|2015-06-03|Samsung Electronics Co., Ltd.|Method of amplifying target nucleic acid with reduced amplification bias and method for determining relative amount of target nucleic acid in sample|

US20130024127A1|2011-07-19|2013-01-24|John Stuelpnagel|Determination of source contributions using binomial probability calculations|

GB201115095D0|2011-09-01|2011-10-19|Singapore Volition Pte Ltd|Method for detecting nucleosomes containing nucleotides|

US8712697B2|2011-09-07|2014-04-29|Ariosa Diagnostics, Inc.|Determination of copy number variations using binomial probability calculations|

JP5536729B2|2011-09-20|2014-07-02|株式会社ソニー・コンピュータエンタテインメント|Information processing apparatus, application providing system, application providing server, application providing method, and information processing method|

CN103930546A|2011-09-26|2014-07-16|凯杰有限公司|Rapid method for isolating extracellular nucleic acids|

WO2013078470A2|2011-11-22|2013-05-30|MOTIF, Active|Multiplex isolation of protein-associated nucleic acids|

US20140364439A1|2011-12-07|2014-12-11|The Broad Institute, Inc.|Markers associated with chronic lymphocytic leukemia prognosis and progression|

US20130190653A1|2012-01-25|2013-07-25|Angel Gabriel Alvarez Ramos|Device for blood collection from the placenta and the umbilical cord|

US9670529B2|2012-02-28|2017-06-06|Population Genetics Technologies Ltd.|Method for attaching a counter sequence to a nucleic acid sample|

US9487828B2|2012-05-10|2016-11-08|The General Hospital Corporation|Methods for determining a nucleotide sequence contiguous to a known target nucleotide sequence|

EP2852687A4|2012-05-21|2016-10-05|Scripps Research Inst|Methods of sample preparation|

WO2014004726A1|2012-06-26|2014-01-03|Caifu Chen|Methods, compositions and kits for the diagnosis, prognosis and monitoring of cancer|

US20140051585A1|2012-08-15|2014-02-20|Natera, Inc.|Methods and compositions for reducing genetic library contamination|

US20140100126A1|2012-08-17|2014-04-10|Natera, Inc.|Method for Non-Invasive Prenatal Testing Using Parental Mosaicism Data|

PT2893040T|2012-09-04|2019-04-01|Guardant Health Inc|Systems and methods to detect rare mutations and copy number variation|

GB2528205B|2013-03-15|2020-06-03|Guardant Health Inc|Systems and methods to detect rare mutations and copy number variation|

US20140065621A1|2012-09-04|2014-03-06|Natera, Inc.|Methods for increasing fetal fraction in maternal blood|

US9523121B2|2013-01-13|2016-12-20|Uni Taq Bio|Methods and compositions for PCR using blocked and universal primers|

US10385394B2|2013-03-15|2019-08-20|The Translational Genomics Research Institute|Processes of identifying and characterizing X-linked disorders|

EP3421613B1|2013-03-15|2020-08-19|The Board of Trustees of the Leland Stanford Junior University|Identification and use of circulating nucleic acid tumor markers|

US20140272956A1|2013-03-15|2014-09-18|Abbott Molecular Inc.|Method for amplification and assay of rna fusion gene variants, method of distinguishing same and related primers, probes, and kits|

WO2015048535A1|2013-09-27|2015-04-02|Natera, Inc.|Prenatal diagnostic resting standards|

US10927408B2|2013-12-02|2021-02-23|Personal Genome Diagnostics, Inc.|Method for evaluating minority variants in a sample|

AU2014369841B2|2013-12-28|2019-01-24|Guardant Health, Inc.|Methods and systems for detecting genetic variants|

JP6494045B2|2014-02-11|2019-04-03|エフ．ホフマン−ラロシュアーゲーＦ．Ｈｏｆｆｍａｎｎ−ＬａＲｏｃｈｅＡｋｔｉｅｎｇｅｓｅｌｌｓｃｈａｆｔ|Target sequencing and UID filtering|

US9677118B2|2014-04-21|2017-06-13|Natera, Inc.|Methods for simultaneous amplification of target loci|

WO2015164432A1|2014-04-21|2015-10-29|Natera, Inc.|Detecting mutations and ploidy in chromosomal segments|

US10179937B2|2014-04-21|2019-01-15|Natera, Inc.|Detecting mutations and ploidy in chromosomal segments|

US20180173846A1|2014-06-05|2018-06-21|Natera, Inc.|Systems and Methods for Detection of Aneuploidy|

EP3164489B1|2014-07-03|2020-05-13|Rhodx, Inc.|Tagging and assessing a target sequence|

EP3169780B1|2014-07-17|2020-02-12|Qiagen GmbH|Method for isolating rna with high yield|

CA2965500A1|2014-10-24|2016-04-28|Abbott Molecular Inc.|Enrichment of small nucleic acids|

CN107208157A|2015-02-27|2017-09-26|赛卢拉研究公司|For method and composition of the bar coding nucleic acid for sequencing|

EP3835431A1|2015-03-30|2021-06-16|Cellular Research, Inc.|Methods and compositions for combinatorial barcoding|

WO2016172373A1|2015-04-23|2016-10-27|Cellular Research, Inc.|Methods and compositions for whole transcriptome amplification|

US10844428B2|2015-04-28|2020-11-24|Illumina, Inc.|Error suppression in sequenced DNA fragments using redundant reads with unique molecular indices |

WO2016183106A1|2015-05-11|2016-11-17|Natera, Inc.|Methods and compositions for determining ploidy|

GB201618485D0|2016-11-02|2016-12-14|Ucl Business Plc|Method of detecting tumour recurrence|

US10011870B2|2016-12-07|2018-07-03|Natera, Inc.|Compositions and methods for identifying nucleic acid molecules|

WO2018156418A1|2017-02-21|2018-08-30|Natera, Inc.|Compositions, methods, and kits for isolating nucleic acids|US11111543B2|2005-07-29|2021-09-07|Natera, Inc.|System and method for cleaning noisy genetic data and determining chromosome copy number|

US20070027636A1|2005-07-29|2007-02-01|Matthew Rabinowitz|System and method for using genetic, phentoypic and clinical data to make predictions for clinical or lifestyle decisions|

US10081839B2|2005-07-29|2018-09-25|Natera, Inc|System and method for cleaning noisy genetic data and determining chromosome copy number|

US10083273B2|2005-07-29|2018-09-25|Natera, Inc.|System and method for cleaning noisy genetic data and determining chromosome copy number|

US8515679B2|2005-12-06|2013-08-20|Natera, Inc.|System and method for cleaning noisy genetic data and determining chromosome copy number|

US11111544B2|2005-07-29|2021-09-07|Natera, Inc.|System and method for cleaning noisy genetic data and determining chromosome copy number|

US9424392B2|2005-11-26|2016-08-23|Natera, Inc.|System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals|

WO2009105531A1|2008-02-19|2009-08-27|Gene Security Network, Inc.|Methods for cell genotyping|

WO2009146335A1|2008-05-27|2009-12-03|Gene Security Network, Inc.|Methods for embryo characterization and comparison|

CA2731991C|2008-08-04|2021-06-08|Gene Security Network, Inc.|Methods for allele calling and ploidy calling|

US10316362B2|2010-05-18|2019-06-11|Natera, Inc.|Methods for simultaneous amplification of target loci|

US20190010543A1|2010-05-18|2019-01-10|Natera, Inc.|Methods for simultaneous amplification of target loci|

EP2473638B1|2009-09-30|2017-08-09|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|

EP2513341B1|2010-01-19|2017-04-12|Verinata Health, Inc|Identification of polymorphic sequences in mixtures of genomic dna by whole genome sequencing|

US9323888B2|2010-01-19|2016-04-26|Verinata Health, Inc.|Detecting and classifying copy number variation|

ES2534986T3|2010-01-19|2015-05-04|Verinata Health, Inc|Simultaneous determination of aneuploidy and fetal fraction|

US9260745B2|2010-01-19|2016-02-16|Verinata Health, Inc.|Detecting and classifying copy number variation|

EP2526415B1|2010-01-19|2017-05-03|Verinata Health, Inc|Partition defined detection methods|

US9411937B2|2011-04-15|2016-08-09|Verinata Health, Inc.|Detecting and classifying copy number variation|

US10388403B2|2010-01-19|2019-08-20|Verinata Health, Inc.|Analyzing copy number variation in the detection of cancer|

RU2620959C2|2010-12-22|2017-05-30|Натера, Инк.|Methods of noninvasive prenatal paternity determination|

RU2650790C2|2012-07-24|2018-04-17|Натера, Инк.|Highly multiplex pcr methods and compositions|

CA2824387C|2011-02-09|2019-09-24|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|

AU2011255641A1|2010-05-18|2012-12-06|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|

MX2017001405A|2014-08-01|2017-05-17|Ariosa Diagnostics Inc|Detection of target nucleic acids using hybridization.|

US11203786B2|2010-08-06|2021-12-21|Ariosa Diagnostics, Inc.|Detection of target nucleic acids using hybridization|

US9163281B2|2010-12-23|2015-10-20|Good Start Genetics, Inc.|Methods for maintaining the integrity and identification of a nucleic acid template in a multiplex sequencing reaction|

CN106011237B|2011-02-24|2019-12-13|香港中文大学|Molecular testing of multiple pregnancies|

LT3078752T|2011-04-12|2018-11-26|Verinata Health, Inc.|Resolving genome fractions using polymorphism counts|

WO2012177792A2|2011-06-24|2012-12-27|Sequenom, Inc.|Methods and processes for non-invasive assessment of a genetic variation|

US10424394B2|2011-10-06|2019-09-24|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|

US9367663B2|2011-10-06|2016-06-14|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|

US9984198B2|2011-10-06|2018-05-29|Sequenom, Inc.|Reducing sequence read count error in assessment of complex genetic variations|

US20140242588A1|2011-10-06|2014-08-28|Sequenom, Inc|Methods and processes for non-invasive assessment of genetic variations|

US10196681B2|2011-10-06|2019-02-05|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|

US8688388B2|2011-10-11|2014-04-01|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|

CA2852665A1|2011-10-17|2013-04-25|Good Start Genetics, Inc.|Analysis methods|

WO2013059746A1|2011-10-19|2013-04-25|Nugen Technologies, Inc.|Compositions and methods for directional nucleic acid amplification and sequencing|

EP2807292B1|2012-01-26|2019-05-22|Tecan Genomics, Inc.|Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library generation|

US8209130B1|2012-04-04|2012-06-26|Good Start Genetics, Inc.|Sequence assembly|

US10227635B2|2012-04-16|2019-03-12|Molecular Loop Biosolutions, Llc|Capture reactions|

AU2013249012B2|2012-04-19|2019-03-28|The Medical College Of Wisconsin, Inc.|Highly sensitive surveillance using detection of cell free DNA|

US9920361B2|2012-05-21|2018-03-20|Sequenom, Inc.|Methods and compositions for analyzing nucleic acid|

KR101705959B1|2012-05-23|2017-02-10|비지아이 다이어그노시스 씨오., 엘티디.|Method and system for identifying types of twins|

US9193992B2|2012-06-05|2015-11-24|Agilent Technologies, Inc.|Method for determining ploidy of a cell|

US10497461B2|2012-06-22|2019-12-03|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|

US20150011396A1|2012-07-09|2015-01-08|Benjamin G. Schroeder|Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing|

US20160040229A1|2013-08-16|2016-02-11|Guardant Health, Inc.|Systems and methods to detect rare mutations and copy number variation|

US10876152B2|2012-09-04|2020-12-29|Guardant Health, Inc.|Systems and methods to detect rare mutations and copy number variation|

PT2893040T|2012-09-04|2019-04-01|Guardant Health Inc|Systems and methods to detect rare mutations and copy number variation|

US10482994B2|2012-10-04|2019-11-19|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|

US10504613B2|2012-12-20|2019-12-10|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|

US20140242581A1|2013-01-23|2014-08-28|Reproductive Genetics And Technology Solutions, Llc|Compositions and methods for genetic analysis of embryos|

US20130309666A1|2013-01-25|2013-11-21|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|

US10844424B2|2013-02-20|2020-11-24|Bionano Genomics, Inc.|Reduction of bias in genomic coverage measurements|

EP2971159B1|2013-03-14|2019-05-08|Molecular Loop Biosolutions, LLC|Methods for analyzing nucleic acids|

US9235808B2|2013-03-14|2016-01-12|International Business Machines Corporation|Evaluation of predictions in the absence of a known ground truth|

WO2014143989A1|2013-03-15|2014-09-18|Medical College Of Wisconsin, Inc.|Fetal well being surveillance using fetal specific cell free dna|

EP2971130A4|2013-03-15|2016-10-05|Nugen Technologies Inc|Sequential sequencing|

EP2981921A1|2013-04-03|2016-02-10|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|

CN112575075A|2013-05-24|2021-03-30|塞昆纳姆股份有限公司|Methods and processes for non-invasive assessment of genetic variation|

AU2014281635B2|2013-06-17|2020-05-28|Verinata Health, Inc.|Method for determining copy number variations in sex chromosomes|

DK3011051T3|2013-06-21|2019-04-23|Sequenom Inc|Method for non-invasive evaluation of genetic variations|

WO2015026967A1|2013-08-20|2015-02-26|Natera, Inc.|Methods of using low fetal fraction detection|

WO2015048535A1|2013-09-27|2015-04-02|Natera, Inc.|Prenatal diagnostic resting standards|

US10577655B2|2013-09-27|2020-03-03|Natera, Inc.|Cell free DNA diagnostic testing standards|

JP6525434B2|2013-10-04|2019-06-05|セクエノム，インコーポレイテッド|Methods and processes for non-invasive assessment of gene mutations|

EP3055427B1|2013-10-07|2018-09-12|Sequenom, Inc.|Methods and processes for non-invasive assessment of chromosome alterations|

US10851414B2|2013-10-18|2020-12-01|Good Start Genetics, Inc.|Methods for determining carrier status|

JP6525473B2|2013-11-13|2019-06-05|ニューゲンテクノロジーズ，インコーポレイテッド|Compositions and methods for identifying replicate sequencing leads|

AU2014369841B2|2013-12-28|2019-01-24|Guardant Health, Inc.|Methods and systems for detecting genetic variants|

CN104745679B|2013-12-31|2018-06-15|杭州贝瑞和康基因诊断技术有限公司|A kind of method and kit of Non-invasive detection EGFR genetic mutation|

CN106164295B|2014-02-25|2020-08-11|生物纳米基因公司|Reducing bias in genome coverage measurements|

US9745614B2|2014-02-28|2017-08-29|Nugen Technologies, Inc.|Reduced representation bisulfite sequencing with diversity adaptors|

CN103901217A|2014-03-21|2014-07-02|靖江市人民医院|Soybean peroxidase immune biochip and application of thereof to detection of serum marks during down syndrome prenatal screening|

US10179937B2|2014-04-21|2019-01-15|Natera, Inc.|Detecting mutations and ploidy in chromosomal segments|

US9677118B2|2014-04-21|2017-06-13|Natera, Inc.|Methods for simultaneous amplification of target loci|

US10262755B2|2014-04-21|2019-04-16|Natera, Inc.|Detecting cancer mutations and aneuploidy in chromosomal segments|

EP3140425B1|2014-05-06|2020-02-12|Baylor College of Medicine|Methods of linearly amplifying whole genome of a single cell|

US11053548B2|2014-05-12|2021-07-06|Good Start Genetics, Inc.|Methods for detecting aneuploidy|

US20160145684A1|2014-06-27|2016-05-26|Genotox Laboratories|Methods of detecting synthetic urine and matching a urine sample to a subject|

CN104156631B|2014-07-14|2017-07-18|天津华大基因科技有限公司|The chromosome triploid method of inspection|

AU2015289414B2|2014-07-18|2021-07-08|Illumina, Inc.|Non-invasive prenatal diagnosis of fetal genetic condition using cellular DNA and cell free DNA|

US20160053301A1|2014-08-22|2016-02-25|Clearfork Bioscience, Inc.|Methods for quantitative genetic analysis of cell free dna|

EP3192879A1|2014-09-11|2017-07-19|Fujifilm Corporation|Method for detecting presence/absence of fetal chromosomal aneuploidy|

WO2016042830A1|2014-09-16|2016-03-24|富士フイルム株式会社|Method for analyzing fetal chromosome|

AU2015318017B2|2014-09-18|2022-02-03|Illumina, Inc.|Methods and systems for analyzing nucleic acid sequencing data|

US10612080B2|2014-09-22|2020-04-07|Roche Molecular Systems, Inc.|Digital PCR for non-invasive prenatal testing|

CA2999708A1|2014-09-24|2016-03-31|Good Start Genetics, Inc.|Process control for increased robustness of genetic assays|

JP2016067268A|2014-09-29|2016-05-09|富士フイルム株式会社|Non-invasive methods for determining fetal chromosomal aneuploidy|

WO2016061514A1|2014-10-17|2016-04-21|Good Start Genetics, Inc.|Pre-implantation genetic screening and aneuploidy detection|

US20180320241A1|2014-12-19|2018-11-08|Roche Sequencing Solutions, Inc.|Methods for identifying multiple epitopes in selected sub-populations of cells|

CN106148323B|2015-04-22|2021-03-05|北京贝瑞和康生物技术有限公司|Method and kit for constructing ALK gene fusion mutation detection library|

WO2016183106A1|2015-05-11|2016-11-17|Natera, Inc.|Methods and compositions for determining ploidy|

JP2019507585A|2015-12-17|2019-03-22|ガーダントヘルス，インコーポレイテッド|Method for determining oncogene copy number by analysis of cell free DNA|

CN109074426A|2016-02-12|2018-12-21|瑞泽恩制药公司|For detecting the method and system of abnormal karyotype|

EP3510171A4|2016-07-01|2020-04-29|Natera, Inc.|Compositions and methods for detection of nucleic acid mutations|

JP2019531700A|2016-07-06|2019-11-07|ガーダントヘルス，インコーポレイテッド|Method for fragment-free profiling of cell-free nucleic acids|

US11200963B2|2016-07-27|2021-12-14|Sequenom, Inc.|Genetic copy number alteration classifications|

WO2018064486A1|2016-09-29|2018-04-05|Counsyl, Inc.|Noninvasive prenatal screening using dynamic iterative depth optimization|

CN109642250A|2016-09-30|2019-04-16|夸登特健康公司|The method of multiresolution analysis for cell-free nucleic acid|

US10451544B2|2016-10-11|2019-10-22|Genotox Laboratories|Methods of characterizing a urine sample|

US10011870B2|2016-12-07|2018-07-03|Natera, Inc.|Compositions and methods for identifying nucleic acid molecules|

CN110520045A|2017-02-03|2019-11-29|斯特里克公司|Sampling pipe with preservative|

WO2018156418A1|2017-02-21|2018-08-30|Natera, Inc.|Compositions, methods, and kits for isolating nucleic acids|

CN110475874A|2017-04-18|2019-11-19|安捷伦科技比利时有限公司|Application of the sequence of missing the target in DNA analysis|

RU2657769C1|2017-05-31|2018-06-15|Федеральное государственное бюджетное учреждение "Национальный медицинский исследовательский центр акушерства, гинекологии и перинатологии имени академика В.И. Кулакова" Министерства здравоохранения Российской Федерации|Method for predicting the presence of chromosomal abnormalities in embryos of satisfactory and poor quality on the basis of estimation of the transcriptional profile in cumulus cells in the program of in vitro fertilization|

EP3642744A1|2017-06-20|2020-04-29|Illumina, Inc.|Methods for accurate computational decomposition of dna mixtures from contributors of unknown genotypes|

SI3658689T1|2017-07-26|2021-08-31|Trisomytest, S.R.O.|A method for non-invasive prenatal detection of fetal chromosome aneuploidy from maternal blood based on bayesian network|

US11099202B2|2017-10-20|2021-08-24|Tecan Genomics, Inc.|Reagent delivery system|

WO2019200228A1|2018-04-14|2019-10-17|Natera, Inc.|Methods for cancer detection and monitoring by means of personalized detection of circulating tumor dna|

WO2020010255A1|2018-07-03|2020-01-09|Natera, Inc.|Methods for detection of donor-derived cell-free dna|

EP3899030A2|2018-12-17|2021-10-27|Natera, Inc.|Methods for analysis of circulating cells|

WO2020214547A1|2019-04-15|2020-10-22|Natera, Inc.|Improved liquid biopsy using size selection|

WO2020247263A1|2019-06-06|2020-12-10|Natera, Inc.|Methods for detecting immune cell dna and monitoring immune system|

US20200402610A1|2019-06-21|2020-12-24|Coopersurgical, Inc.|Systems and methods for determining genome ploidy|

WO2021243045A1|2020-05-29|2021-12-02|Natera, Inc.|Methods for detection of donor-derived cell-free dna|

WO2022033557A1|2020-08-13|2022-02-17|Beijing Biobiggen Technology Co., Ltd.|Method, kit and system for synchronous prenatal detection of chromosomal aneuploidy and monogenic disease|

法律状态:
2018-10-23| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-05-21| B06T| Formal requirements before examination [chapter 6.20 patent gazette]|

2019-06-04| B06T| Formal requirements before examination [chapter 6.20 patent gazette]|

2019-07-30| B06I| Publication of requirement cancelled [chapter 6.9 patent gazette]|Free format text: ANULADA A PUBLICACAO CODIGO 6.20 NA RPI NO 2524 DE 21/05/2019 POR TER SIDO INDEVIDA. |

2020-01-28| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2020-03-17| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 18/11/2011, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US201161462972P| true| 2011-02-09|2011-02-09|

US61/462,972|2011-02-09|

US201161448547P| true| 2011-03-02|2011-03-02|

US61/448,547|2011-03-02|

US201161516996P| true| 2011-04-12|2011-04-12|

US61/516,996|2011-04-12|

US13/110,685|2011-05-18|

US13/110,685|US8825412B2|2010-05-18|2011-05-18|Methods for non-invasive prenatal ploidy calling|

US201161571248P| true| 2011-06-23|2011-06-23|

US61/571,248|2011-06-23|

US201161542508P| true| 2011-10-03|2011-10-03|

US61/542,508|2011-10-03|

US13/300,235|US10017812B2|2010-05-18|2011-11-18|Methods for non-invasive prenatal ploidy calling|

PCT/US2011/061506|WO2012108920A1|2011-02-09|2011-11-18|Methods for non-invasive prenatal ploidy calling|

US13/300,235|2011-11-18|

[返回顶部]